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SELFISH  ROUTING 


Tim  Roughgarden,  Ph.D. 
Cornell  University  2002 


A  central  and  well-studied  problem  arising  in  the  management  of  a  large  network 
is  that  of  routing  traffic  to  achieve  the  best  possible  network  performance.  In  many 
networks,  it  is  difficult  or  even  impossible  to  impose  optimal  routing  strategies  on 
network  traffic,  leaving  network  users  free  to  act  according  to  their  own  interests. 
In  general,  the  result  of  local  optimization  by  many  selfish  network  users  with  con¬ 
flicting  interests  does  not  possess  any  type  of  global  optimality;  hence,  this  lack  of 
regulation  carries  the  cost  of  decreased  network  performance. 

We  study  the  degradation  in  network  performance  due  to  selfish,  uncoordinated 
behavior  by  network  users.  Our  contributions  are  twofold:  we  quantify  the  worst- 
possible  loss  in  network  performance  arising  from  noncooperative  behavior;  and  we 
design  and  analyze  algorithms  for  building  and  managing  networks  so  that  selfish 
behavior  leads  to  a  socially  desirable  outcome. 

To  quantify  the  loss  in  network  performance  caused  by  selfish  behavior,  we  in¬ 
vestigate  the  following  question:  what  is  the  worst-case  ratio  between  the  social  cost 
of  an  uncoordinated  outcome  and  the  social  cost  of  the  best  coordinated  outcome? 
We  provide  an  exact  solution  to  this  problem  in  a  variety  of  traffic  models,  thereby 
identifying  types  of  networks  for  which  the  cost  of  routing  selfishly  is  mild. 

The  inefficiency  inherent  in  an  uncoordinated  outcome  and  the  inability  to  cen¬ 
trally  implement  a  globally  optimal  solution  motivate  our  second  question:  how 
can  we  ensure  that  the  loss  in  network  performance  due  to  selfish  behavior  will  be 
tolerable?  We  explore  two  algorithmic  approaches  to  coping  with  the  selfishness  of 
network  users.  First,  we  consider  the  problem  of  designing  networks  that  exhibit 
good  performance  when  used  selfishly.  Second,  we  study  how  to  route  a  small  frac¬ 
tion  of  the  traffic  centrally  to  induce  “good”  (albeit  selfish)  behavior  from  the  rest 
of  the  network  users.  We  give  both  efficient  algorithms  with  provable  performance 
guarantees  and  hardness  of  approximation  results  for  these  two  problems. 
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Chapter  1 
Introduction 


1.1  Selfish  Routing 

What  route  should  you  take  to  work  tomorrow?  All  else  being  equal,  most  of  us 
would  probably  opt  for  the  one  that  allows  us  to  wake  up  at  the  least  barbaric 
time — that  is,  most  of  us  would  prefer  the  shortest  route  available.  As  any  morning 
commuter  knows,  the  length  of  time  required  to  travel  along  a  given  route  depends 
crucially  on  the  amount  of  traffic  congestion — on  the  number  of  other  commuters 
who  choose  interfering  routes.  In  selecting  a  path  to  travel  from  home  to  work,  do 
you  take  into  account  the  additional  congestion  that  you  cause  other  commuters  to 
experience?  Not  likely.  Almost  certainly  you  choose  your  route  selfishly,  aiming  to 
get  to  work  as  quickly  as  possible  without  considering  the  adverse  effects  your  choice 
creates  for  others.  Naturally,  you  also  expect  your  fellow  commuters  to  behave  in 
a  similarly  egocentric  fashion.  But  what  if  all  of  you  cooperated  and  coordinated 
routes?  Is  it  possible  to  limit  the  interference  between  routes,  thereby  improving 
the  average  (or  the  maximum)  commute  time?  If  so,  by  how  much? 

In  this  thesis,  we  study  the  loss  of  social  welfare  due  to  selfish,  uncoordinated 
behavior  in  networks.  Our  contributions  are  twofold:  we  quantify  the  worst-possible 
loss  of  social  welfare  arising  from  noncooperative  behavior  in  a  variety  of  traffic 
models;  and  we  design  and  analyze  algorithms  for  building  and  managing  networks 
so  that  selfish  behavior  leads  to  a  socially  desirable  outcome.  Our  results  concern 
more  than  just  the  road  networks  described  in  the  previous  paragraph;  we  will 
see  that  they  also  have  consequences  for  high-speed  communication  networks  (with 
networks  users  seeking  to  minimize  the  end-to-end  delay  experienced  by  their  traffic). 
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Figure  1.1:  Pigou’s  example.  A  latency  function  £(x)  describes  the  delay  experienced 
by  drivers  on  a  road  as  a  function  of  the  fraction  of  overall  traffic  using  that  road. 

1.2  Two  Motivating  Examples 

To  motivate  the  questions  investigated  in  this  thesis  (which  we  will  describe  in  Sec¬ 
tion  1.3),  let  us  informally  explore  two  important  examples.  The  first  is  essentially 
due  to  Pigou  in  his  1920  treatise  [148,  P.194]  (see  also  Knight  [99]),  and  the  second 
is  a  famous  “paradox”  discovered  by  Braess  in  1968  [28]. 

1.2.1  Pigou’s  Example 

Posit  a  suburb  and  a  nearby  train  station  (denoted  by  s  and  t,  respectively)  con¬ 
nected  by  two  non- interfering  highways,  and  a  fixed  number  of  drivers  who  wish  to 
commute  from  s  to  t  at  roughly  the  same  time.  Suppose  the  first  highway  is  short 
but  narrow  with  the  delay  experienced  on  it  while  driving  from  s  to  t  increasing 
sharply  with  the  number  of  drivers.  Suppose  the  second  is  wide  enough  to  accom¬ 
modate  all  of  the  traffic  without  any  crowding  but  takes  a  long,  circuitous  route. 
For  concreteness,  assume  that  all  drivers  on  the  latter  highway  require  1  hour  to 
drive  from  s  and  t  (irrespective  of  the  number  of  other  drivers  on  the  road),  while 
the  delay  (in  hours)  along  the  former  route  equals  the  fraction  of  the  overall  traffic 
choosing  to  use  it.  Pictorially,  we  are  discussing  the  network  of  Figure  1.1,  where 
the  functions  £(■)  (which  we  will  call  latency  functions)  describe  the  latency  or  delay 
experienced  by  drivers  on  a  road  as  a  function  of  the  fraction  of  the  overall  traffic 
using  that  road;  thus,  the  top  edge  in  Figure  1.1  represents  the  long  wide  highway, 
and  the  bottom  edge  the  short  narrow  road. 

Assuming  that  all  drivers  aim  to  minimize  the  time  taken  to  drive  from  s  and  t, 
we  have  good  reason  to  expect  all  traffic  to  follow  the  lower  road  and  therefore,  due  to 
the  ensuing  congestion,  to  incur  one  hour  of  delay  traveling  from  s  to  t.  Indeed,  any 
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driver  on  the  top  road  (experiencing  1  hour  of  latency)  will  soon  become  envious  of 
the  drivers  on  the  lower  route  (who  experience  less  than  1  hour  of  latency,  provided 
some  traffic  chooses  the  other  route)  and  will  change  his  or  her  opinion  about  which 
route  is  superior. 

Now  suppose  that,  by  whatever  means,  we  can  choose  who  drives  where.  Can  we 
improve  over  the  previous  “selfish”  outcome  with  the  power  of  centralized  control? 
To  see  that  we  can,  consider  the  outcome  of  assigning  half  of  the  traffic  to  each  of 
the  two  routes.  The  drivers  forced  onto  the  long,  wide  highway  experience  one  hour 
of  delay,  and  are  thus  no  worse  off  than  in  the  previous  outcome;  on  the  other  hand, 
drivers  allowed  to  use  the  short  narrow  road  now  enjoy  lighter  traffic  conditions, 
and  arrive  at  their  destination  after  a  mere  30  minutes.  We  have  thus  improved  the 
state  of  affairs  for  half  of  the  drivers  while  making  no  one  worse  off;  moreover,  the 
average  delay  experienced  by  traffic  has  dropped  from  60  to  45  minutes,  a  significant 
improvement. 

Pigou’s  example  demonstrates  a  principle  that  is  well-known  and  well-studied  in 
economics  and  game  theory:  selfish  behavior  by  independent,  noncooperative  agents 
need  not  produce  a  socially  desirable  outcome.  In  Part  II  of  this  thesis,  we  quantify 
this  phenomenon  in  several  traffic  models  by  analyzing  how  much  worse  a  selfishly- 
defined  outcome  can  be  relative  to  the  best  outcome  achievable  with  complete  co¬ 
ordination. 

1.2.2  Braess’s  Paradox 

Pigou’s  example  illustrates  an  important  principle,  that  the  outcome  of  selfish  be¬ 
havior  need  not  optimize  social  welfare.  However,  it  is  perhaps  unsurprising  that 
the  result  of  local  optimization  by  many  individuals  with  conflicting  interests  does 
not  possess  any  type  of  global  optimality.  The  next  example,  due  to  Braess  [28]  and 
subsequently  reported  by  Murchland  [128],  is  decidedly  less  intuitive. 

We  again  begin  with  a  suburb  s,  a  train  station  t,  and  a  fixed  number  of  drivers 
who  wish  to  commute  from  s  to  t.  For  the  moment,  we  will  assume  two  non¬ 
interfering  routes  from  s  and  t,  each  comprising  one  long  wide  road  and  one  short 
narrow  road  as  shown  in  Figure  1.2(a).  By  symmetry,  in  a  selfishly-defined  outcome 
we  expect  each  of  the  two  routes  to  carry  half  of  the  overall  traffic,  so  that  all  drivers 
incur  90  minutes  of  latency  traveling  from  s  to  t. 

Now,  an  hour  and  a  half  is  quite  a  commute.  Suppose  that,  in  an  effort  to 
alleviate  these  unacceptable  delays,  we  harness  the  finest  available  road  technology 
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Figure  1.2:  Braess’s  Paradox.  The  addition  of  an  intuitively  helpful  edge  can  ad¬ 
versely  affect  all  of  the  users  of  a  congested  network. 


to  build  a  very  short  and  very  wide  highway  joining  the  midpoints  of  the  two  existing 
routes.  The  new  network  is  shown  in  Figure  1.2(b),  with  the  new  road  represented 
by  edge  (v,w)  endowed  with  the  constant  latency  function  £(x)  =  0  (independent 
of  the  road  congestion).  How  will  the  drivers  react? 

We  cannot  expect  the  previous  traffic  pattern  to  persist  in  the  new  network;  any 
driver  can  save  roughly  30  minutes  of  travel  time  (assuming  other  drivers  keep  their 
choices  fixed)  by  following  route  s  — >  v  — >  w  — >  t.  Suppose  that  all  drivers,  in  their 
haste  to  make  use  of  the  new  road,  simultaneously  deviate  from  their  previous  routes 
to  instead  follow  the  path  s  — >  v  — >  w  — >  t.  Because  of  the  heavy  congestion  on 
edges  (s,v)  and  ( w,t ),  all  of  these  drivers  now  experience  two  hours  of  delay  when 
driving  from  s  to  £;  moreover,  this  congestion  also  implies  that  neither  of  the  two 
alternative  routes  is  superior  and  thus  no  driver  has  an  incentive  to  change  routes. 
Even  worse,  any  other  traffic  pattern  is  unstable  in  the  sense  that  some  drivers  will 
have  an  incentive  to  switch  paths.  It  is  therefore  reasonable  to  expect  all  drivers  to 
follow  path  s  — >  v  — >  w  — >  t  in  the  selfishly-defined  outcome  in  the  new  network 
and  thus  experience  30  minutes  more  delay  than  in  the  original  network.  Braess’s 
Paradox  thus  shows  that  the  intuitively  helpful  (or  at  least  innocuous)  action  of 
adding  a  new  zero-latency  link  may  negatively  impact  all  of  the  traffic! 

Braess’s  Paradox  raises  some  interesting  issues.  First,  it  furnishes  a  second  ex¬ 
ample  of  the  suboptimality  of  selfishly- defined  outcomes.  Indeed,  Braess’s  example 
demonstrates  this  principle  in  a  stronger  form  than  does  Pigou’s,  in  that  all  drivers 
would  strictly  prefer  a  coordinated  outcome  (namely,  the  original  traffic  pattern  in 
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the  network  of  Figure  1.2(a))  to  the  one  obtained  by  acting  non-cooperatively.1  More 
importantly,  Braess’s  Paradox  shows  that  the  interactions  between  selfish  behavior 
and  the  underlying  network  structure  defy  intuition  and  are  not  easy  to  predict. 
When  we  tackle  the  algorithmic  questions  of  how  to  design  and  manage  networks 
so  that  selfish  behavior  results  in  a  socially  desirable  outcome  (a  task  we  undertake 
in  Part  Ill  of  this  thesis),  we  must  bear  in  mind  the  moral  of  Braess’s  paradox: 
“bigger”  need  not  be  “better”. 

1.3  Our  Contributions 

To  describe  our  results  precisely,  we  must  be  more  formal  about  our  model  of  selfish 
routing  in  a  network.  We  consider  a  directed  network  in  which  each  edge  possesses 
a  latency  function  describing  the  common  latency  incurred  by  all  traffic  on  the  edge 
as  a  function  of  the  edge  congestion  (as  in  the  two  examples  of  Section  1.2).  We 
are  given  a  rate  of  traffic  between  each  ordered  pair  of  nodes  in  the  network;  in  the 
two  examples  of  Section  1.2  there  was  a  positive  traffic  rate  only  for  one  ordered 
pair,  but  we  will  also  be  interested  in  networks  where  different  users  have  different 
sources  and  destinations  (that  is,  in  multicommodity  networks).  We  aspire  toward 
an  assignment  of  traffic  to  paths  minimizing  the  sum  of  all  travel  times  (the  total 
latency )2  of  network  users,  although  the  examples  of  Section  1.2  demonstrate  that 
selfish  behavior  need  not  achieve  this  goal. 

We  assume  that  an  unregulated  network  user  will  always  choose  the  minimum- 
latency  path  from  its  source  to  its  destination  (given  the  link  congestion  caused  by 
the  rest  of  the  network  users).  As  the  route  chosen  by  one  network  user  affects 
the  congestion  (and  hence  the  latency)  experienced  by  others,  the  reader  familiar 
with  basic  game  theory  will  recognize  the  essential  ingredients  of  a  noncooperative 
game.  Motivated  by  this  analogy,  when  no  network  user  has  an  incentive  to  reroute 
its  traffic,  we  will  follow  the  conventions  of  noncooperative  game  theory  and  say 
that  the  network  is  at  Nash  equilibrium.  For  instance,  we  saw  that  the  network  of 
Pigou’s  example  (Figure  1.1)  is  at  Nash  equilibrium  when  all  traffic  is  routed  on 

lrThis  stronger  form  is  also  well  known  in  the  game  theory  literature;  perhaps  its  most  famous 
manifestation  occurs  in  the  so-called  “Prisoner’s  Dilemma”  [56,  150]. 

2Minimizing  the  average  (rather  than  total)  travel  time  may  strike  the  reader  as  a  more  natural 
objective;  however,  these  two  objective  functions  differ  only  by  a  normalizing  constant  (namely, 
the  amount  of  network  traffic)  and  are  therefore  equivalent  for  our  purposes.  We  work  with  total 
latency  for  technical  convenience. 


the  bottom  link,  while  the  network  of  Braess’s  Paradox  (Figure  1.2(b))  is  at  Nash 
equilibrium  when  all  traffic  follows  the  path  s  — >  v  — >  w  — >  t.  We  can  view  a 
Nash  equilibrium  as  a  natural  operating  point  of  a  network  in  which  users  are  not 
centrally  controlled  and  route  their  traffic  selfishly — in  the  language  of  game  theory, 
as  a  natural  outcome  of  “rational  behavior”. 

Finally,  we  will  assume  that  each  network  user  controls  a  negligible  fraction  of 
the  overall  traffic;  assignments  of  traffic  to  paths  in  the  network  can  then  be  modeled 
in  a  continuous  manner  by  network  flow,  with  the  amount  of  flow  between  a  pair 
of  nodes  in  the  network  equal  to  the  rate  of  traffic  between  the  two  nodes.  A  Nash 
equilibrium  then  corresponds  to  a  flow  in  which  all  flow  paths  between  a  given  source 
and  destination  have  minimum  latency  (if  a  flow  does  not  have  this  property,  some 
traffic  can  improve  its  travel  time  by  switching  from  a  longer  path  to  a  shorter  one). 

Remark  1.3.1  While  our  examples  have  been  phrased  in  the  language  of  road  net¬ 
works,  we  emphasize  that  our  model  and  results  apply  equally  well  to  high-speed 
communication  networks.  The  reader  familiar  with  standard  Internet  routing  proto¬ 
cols  might  object  that,  in  most  current  networks,  users  cannot  select  paths  for  their 
traffic  and  are  instead  at  the  mercy  of  the  network  routers.  As  Friedman  [74]  points 
out,  however,  the  paths  used  by  routers  are  typically  computed  by  a  distributed 
shortest-path  computation  [98]  and,  provided  end-to-end  link  delay  is  used  as  the 
metric  on  the  network  links,  the  only  stable  routings  of  traffic  are  Nash  equilibria. 
Of  course,  neither  communication  nor  road  networks  need  exhibit  stable  behavior 
even  when  all  traffic  rates  are  held  fixed;  nevertheless,  we  believe  that  the  study 
of  Nash  equilibria  is  a  natural  first  step  in  understanding  the  behavior  of  actual 
networks. 

1.3.1  Bounding  the  Price  of  Anarchy 

In  Chapters  3  and  4  of  this  thesis,  we  study  the  degradation  in  network  perfor¬ 
mance  caused  by  the  selfish  behavior  of  noncooperative  network  users  in  a  variety 
of  traffic  models.  Motivated  by  work  of  Koutsoupias  and  Papadimitriou  [108]  in  a 
different  context,  we  quantify  this  degradation  with  the  following  question:  what 
is  the  worst-case  ratio  between  the  total  latency  of  a  Nash  equilibrium  and  that  of 
the  best  coordinated  outcome — of  a  flow  minimizing  the  total  latency?  This  ques¬ 
tion  carries  particular  importance  for  networks  in  which  Nash  equilibria  are  not 
too  inefficient — proving  strong  upper  bounds  on  this  worst-case  ratio  obviates  the 
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need  for  centralized  control  (provided  network  users  do  indeed  act  in  a  purely  selfish 
manner) . 

Computing  the  Price  of  Anarchy 

We  prove  sharp  upper  bounds  on  this  worst-case  ratio  (recently  dubbed  “the  price 
of  anarchy”  by  Papadimitriou  [142])  for  networks  in  which  edge  latency  does  not 
depend  in  a  highly  nonlinear  fashion  on  the  edge  congestion.  We  can  therefore 
conclude  that  the  cost  of  foregoing  centralized  control  in  such  networks  is  mild.  For 
example,  we  prove  the  following. 

•  In  networks  with  latency  functions  that  are  polynomials  with  nonnegative 
coefficients  and  degree  at  most  p,  the  price  of  anarchy  is  [1—  p-(p+l)_(p+1)/p]_1, 
which  is  asymptotically  0 ( yyy )  as  p  — >  oo.  The  bound  of  |  (when  p  =  1)  for 
networks  with  latency  functions  of  the  form  £(x)  =  ax  +  b  for  a,  b  >  0  shows 
that  Pigou’s  example  and  Braess’s  Paradox  (see  Section  1.2)  are  worst-case 
examples  for  the  inefficiency  of  Nash  equilibria  in  such  networks.3 

•  In  networks  with  latency  functions  corresponding  to  the  expected  waiting  time 
of  an  M/M/1  queue  (functions  of  the  form  £(x)  =  {u  —  x)~l  where  u  denotes 
an  edge  capacity  or  queue  service  rate),  the  price  of  anarchy  is  bounded  if  and 
only  if  the  maximum  allowable  amount  of  traffic  Rmax  is  constrained  to  be 
less  than  the  minimum  allowable  edge  capacity  umin]  in  this  case,  the  price  of 
anarchy  is  (1  +  ^umin/{umin  -  Rmax))/ 2. 

The  first  result  demonstrates  that  the  cost  of  routing  selfishly  depends  crucially 
on  the  “steepness”  of  the  network  latency  functions.  The  second  has  the  following  in¬ 
tuitive  interpretation:  since  the  worst-case  ratio  approaches  1  as  umin/ Rmax  — > ►  +oo 
and  approaches  +oo  as  umin/Rmax  — >  1,  the  price  of  routing  selfishly  in  a  network 
with  M/M/1  delay  functions  is  always  tolerable  provided  the  network  capacity  is 
sufficiently  large  relative  to  the  demand  for  bandwidth,  and  may  be  intolerable 
otherwise. 

A  Bicriteria  Bound  for  Arbitrary  Latency  Functions 

Our  work  above  shows  that  Nash  equilibria  may  incur  much  more  latency  than 
minimum-latency  flows  in  networks  with  latency  functions  that  exhibit  steep  growth 

■^Throughout  this  thesis,  we  will  call  such  latency  functions  linear  while  admitting  that  affine 
would  be  a  more  accurate  adjective. 
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(such  as  networks  with  M/M/1  delay  functions).  Since  such  latency  functions  are 
common  in  important  applications,  such  as  in  routing  in  the  Internet  and  other 
communication  networks  [20,  98],  we  would  nevertheless  like  to  meaningfully  bound 
the  inefficiency  of  Nash  equilibria  in  networks  with  arbitrarily  steep  latency  func¬ 
tions.  Toward  this  end,  we  consider  bicriteria  results.  In  particular,  we  compare 
the  total  latency  of  a  Nash  equilibrium  with  that  of  a  minimum-latency  flow  that 
routes  additional  traffic  between  each  pair  of  nodes.  We  prove  that  in  a  network 
with  latency  functions  assumed  only  to  be  continuous  and  nondecreasing,  the  total 
latency  incurred  by  traffic  at  Nash  equilibrium  is  at  most  that  of  a  minimum-latency 
flow  forced  to  route  twice  as  much  traffic  between  each  source-destination  pair.  This 
result  has  an  alternative  interpretation:  in  lieu  of  centralized  control,  the  price  of 
routing  selfishly  can  be  offset  by  a  moderate  increase  in  link  speed  (which  for  the 
M/M/1  delay  functions  £(x)  —  (u  —  x)1  mentioned  above  can  be  effected  by  dou¬ 
bling  the  capacity  u  of  every  edge). 

The  Price  of  Anarchy  is  Independent  of  the  Network  Topology 

As  a  corollary  of  our  methods  for  computing  the  price  of  anarchy,  we  prove  that 
the  “steepness”  of  a  network’s  latency  functions  is  in  some  sense  the  only  cause  of 
the  inefficiency  of  Nash  equilibria,  and  that  the  complexity  of  the  network  topology 
plays  no  role.  Specifically,  we  show  the  following  under  weak  hypotheses  on  the 
class  of  allowable  latency  functions:  among  all  multicommodity  flow  networks,  net¬ 
works  comprising  only  two  nodes  and  a  collection  of  parallel  links  furnish  worst-case 
examples  for  the  losses  due  to  selfish  routing.  Thus,  for  any  fixed  class  of  latency 
functions,  no  nontrivial  restriction  on  the  class  of  allowable  network  topologies  or 
on  the  number  of  commodities  will  improve  the  price  of  anarchy.  In  the  special  case 
of  a  class  of  latency  functions  that  includes  all  of  the  constant  functions,  we  prove 
that  a  network  with  only  two  parallel  links  suffices  to  achieve  the  worst-possible 
ratio.  Informally,  these  results  imply  that  the  inefficiency  inherent  in  a  Nash  equi¬ 
librium  stems  from  the  inability  of  selfish  users  to  discern  which  of  two  competing 
routes  is  superior  and  not  from  the  topological  complexity  arising  from  the  diverse 
intersections  of  many  paths  belonging  to  different  commodities. 

Application  to  Counterintuitive  Phenomena  in  Physical  Systems 

Braess’s  Paradox  (as  described  in  Subsection  1.2.2)  is  not  particular  to  traffic  in 
networks;  perhaps  the  most  compelling  analogue  occurs  in  a  mechanical  network 
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(a)  Before 


Figure  1.3:  Strings  and  springs.  Severing  a  taut  string  results  in  the  rise  of  a  heavy 
weight. 


of  strings  and  springs,  constructed  by  Cohen  and  Horowitz  [36]  and  shown  in  Fig¬ 
ure  1.3. 4  In  this  device,  one  end  of  a  spring  is  attached  to  a  fixed  support,  and  the 
other  end  to  a  string.  A  second  identical  spring  is  hung  from  the  free  end  of  the 
string  and  carries  a  heavy  weight.  Finally,  strings  are  connected  (with  some  slack) 
from  the  support  to  the  upper  end  of  the  second  spring  and  from  the  lower  end 
of  the  first  spring  to  the  weight.  Assuming  that  the  springs  are  ideally  clastic,  the 
stretched  length  of  a  spring  is  a  linear  function  of  the  force  applied  to  it.  We  may 
thus  view  the  network  of  strings  and  springs  as  a  traffic  network,  where  force  corre¬ 
sponds  to  flow  and  physical  distance  corresponds  to  latency.  With  a  suitable  choice 
of  string  and  spring  lengths  and  spring  constants,  the  equilibrium  position  of  this 
mechanical  network  is  described  by  Figure  1.3(a).  Contrary  to  intuition,  severing 
the  taut  string  causes  the  weight  to  rise,  as  shown  in  Figure  1.3(b).  The  explana¬ 
tion  for  this  curiosity  is  the  following.  Initially,  the  two  springs  are  connected  “in 
series”,  and  each  bears  the  full  weight  and  is  stretched  out  to  great  length.  After 
cutting  the  taut  string,  the  two  springs  are  only  connected  “in  parallel” ;  each  spring 
then  carries  only  half  of  the  weight,  and  accordingly  is  stretched  to  only  half  of 


4We  are  indebted  to  Leslie  Ann  Goldberg  for  pointing  out  this  application  of  our  work. 
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its  previous  length.  This  counterintuitive  effect  corresponds  to  the  improvement  in 
the  Nash  equilibrium  obtained  by  deleting  the  zero-latency  edge  of  Figure  1.2(b)  to 
obtain  the  network  of  Figure  1.2(a). 

Our  result  above  showing  that  the  total  latency  of  a  Nash  equilibrium  in  a 
network  with  linear  latency  functions  is  at  most  |  times  that  of  a  minimum-latency 
flow  provides  a  quantitative  limit  on  the  extent  to  which  this  phenomenon  can  occur. 
In  particular,  we  show  that  this  result  implies  that  for  any  system  of  strings  and 
springs  carrying  a  single  weight,  the  distance  between  the  support  and  the  weight 
after  severing  an  arbitrary  collection  of  strings  and  springs  is  at  least  |  times  the 
original  support-weight  distance. 

Further  examples  of  analogous  counterintuitive  phenomena  have  been  exhibited 
in  two-terminal  electrical  networks  [36],  and  our  results  give  analogous  bounds  on  the 
largest  possible  increase  in  conductivity  obtainable  by  removing  conducting  links. 

Extensions  to  Other  Models 

Our  techniques  for  computing  the  price  of  anarchy  are  not  model-specific;  we  demon¬ 
strate  this  by  extending  several  of  the  above  results  to  more  general  and  realistic 
models.  In  particular,  we  consider  networks  in  which  users  can  only  evaluate  path 
latency  approximately,  rather  than  exactly;  networks  with  a  finite  number  of  net¬ 
work  users,  each  controlling  a  strictly  positive  (as  opposed  to  negligible)  amount  of 
traffic;  and  a  more  general  class  of  games  that  need  not  take  place  in  a  network. 

1.3.2  Braess’s  Paradox  and  Network  Design 

In  Part  III  we  turn  our  attention  toward  coping  with  selfishness — that  is,  toward 
methods  for  designing  and  managing  networks  so  that  selfish  routing  leads  to  a 
desirable  outcome.  In  Chapter  5,  we  pursue  this  goal  via  network  design ;  namely, 
armed  with  the  knowledge  that  our  networks  will  be  host  to  selfish  users,  how  can 
we  design  them  to  minimize  the  inefficiency  inherent  in  a  user-defined  equilibrium? 

A  natural  measure  for  the  performance  of  a  network  with  selfish  routing  is  the 
total  latency  of  a  Nash  equilibrium.  Recall  from  Braess’s  Paradox  (Subsection  1.2.2) 
the  counterintuitive  fact  that  removing  edges  from  a  network  may  improve  its  per¬ 
formance.  This  observation  immediately  suggests  the  following  network  design  prob¬ 
lem:  given  a  network  with  latency  functions  on  the  edges  and  a  traffic  rate  between 
each  pair  of  vertices,  which  edges  should  be  removed  to  obtain  the  best  possible 
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Nash  equilibrium?  Equivalently,  given  a  large  network  of  candidate  edges  to  build, 
which  subnetwork  will  exhibit  the  best  performance  when  used  selfishly? 

We  give  optimal  inapproximability  results  and  approximation  algorithms  for  sev¬ 
eral  network  design  problems  of  this  type.  For  example,  we  prove  that  for  networks 
with  one  source-destination  pair  and  arbitrary  (continuous  and  nondecreasing)  edge 
latency  functions,  there  is  no  (f  —  ^-approximation  algorithm5  for  network  design 
for  any  e  >  0,  where  n  is  the  number  of  vertices  in  the  network  (unless  P  =  NP).  We 
also  prove  this  hardness  result  to  be  best  possible  by  exhibiting  an  ^-approximation 
algorithm  for  the  problem.  For  networks  in  which  the  latency  of  each  edge  is  a 
linear  function  of  the  congestion,  we  prove  that  there  is  no  (|  —  e)-approximation 
algorithm  for  the  problem  (for  any  e  >  0,  unless  P  =  NP),  even  in  networks  with 
a  single  source-destination  pair.  Since  a  (—approximation  algorithm  for  this  special 
case  follows  easily  from  our  work  bounding  the  price  of  anarchy,  this  hardness  result 
is  sharp. 

Moreover,  we  prove  that  an  optimal  approximation  algorithm  for  these  network 
design  problems  is  what  we  call  the  trivial  algorithm:  given  a  network  of  candidate 
edges,  build  the  entire  network.  As  a  consequence  of  the  optimality  of  the  trivial 
algorithm,  we  prove  that  inefficiency  due  to  harmful  extraneous  edges  (as  in  Braess’s 
Paradox)  is  impossible  to  detect  efficiently,  even  in  worst-possible  instances. 

In  the  course  of  proving  our  results,  we  introduce  a  new  family  of  graphs  gener¬ 
alizing  the  network  of  the  original  Braess’s  Paradox.  This  family  may  be  of  inde¬ 
pendent  interest,  as  these  networks  give  the  first  demonstration  that  the  severity  of 
Braess’s  Paradox  can  increase  with  the  network  size  (for  networks  with  nonlinear 
latency  functions). 

1.3.3  Stackelberg  Routing 

In  Chapter  6  we  continue  to  explore  techniques  for  coping  with  selfishness,  motivated 
by  the  following  idea.  In  some  networks,  there  will  be  a  mix  of  “selfishly  controlled” 
and  “centrally  controlled”  traffic — that  is,  the  network  is  used  by  both  selfish  in¬ 
dividuals  and  by  some  central  authority.  We  study  the  following  question:  given  a 
network  with  centrally  and  selfishly  controlled  traffic,  how  should  centrally  controlled 
traffic  be  routed  to  induce  “good”  (albeit  selfish)  behavior  from  the  noncooperative 

5 A  c- approximation  algorithm  for  a  minimization  problem  runs  in  polynomial  time  and  returns 
a  solution  no  more  than  c  times  as  costly  as  an  optimal  solution.  The  value  c  is  the  approximation 
ratio  or  performance  guarantee  of  the  algorithm. 
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users? 

We  formulate  this  goal  as  an  optimization  problem  via  Stackelberg  games ,  games 
in  which  one  player  acts  as  a  leader  (here,  the  centralized  authority  interested  in 
minimizing  total  latency)  and  the  rest  as  followers  (the  selfish  users).  The  problem 
is  then  to  compute  a  strategy  for  the  leader  (a  Stackelberg  strategy )  that  induces  the 
followers  to  react  in  a  way  that  (at  least  approximately)  minimizes  the  total  latency 
in  the  network. 

We  prove  that  it  is  NP-hard  to  compute  the  optimal  Stackelberg  strategy  in  net¬ 
works  of  parallel  links  and  present  simple  strategies  for  such  networks  with  provable 
performance  guarantees.  More  precisely,  we  give  a  simple  algorithm  that  computes 
a  leader  strategy  in  a  network  of  parallel  links  inducing  an  equilibrium  with  total 
latency  no  more  than  a  constant  times  that  of  the  minimum- latency  flow;  a  simple 
variant  on  Pigou’s  example  (Subsection  1.2.1)  shows  that  no  result  of  this  type  is 
possible  in  the  absence  of  centrally  controlled  traffic  and  a  Stackelberg  strategy.  We 
also  prove  stronger  performance  guarantees  for  networks  of  parallel  links  with  linear 
latency  functions. 

1.4  Comparison  to  Previous  Work 

In  this  section  we  place  our  contributions  in  context  by  describing  previous  work. 
We  confine  ourselves  to  a  high-level  review  and  to  the  most  relevant  references, 
postponing  more  specific  and  in-depth  surveys  to  later  chapters. 

Bounding  the  Price  of  Anarchy 

The  traffic  model  studied  in  this  thesis  dates  back  to  the  1950’s  [17,  186]  and  has 
been  extensively  studied  ever  since;  we  defer  a  survey  of  this  literature  until  Chap¬ 
ter  3.  However,  the  problem  of  quantifying  the  inefficiency  inherent  in  a  user-defined 
equilibrium  has  been  considered  only  recently.  To  the  best  of  our  knowledge,  the 
only  previous  work  with  this  goal  (and  the  inspiration  for  much  of  the  work  de¬ 
scribed  in  this  thesis)  is  the  paper  of  Koutsoupias  and  Papadimitriou  [108].  How¬ 
ever,  the  model  of  [108]  is  quite  different  from  the  one  considered  here  (see  Chapter  3 
for  details).  More  recently  (and  subsequent  to  some  of  our  work),  the  model  and 
results  of  [108]  have  been  generalized  in  a  series  of  papers  by  Mavronicolas  and 
Spirakis  [122],  Czumaj  and  Vocking  [44],  and  Czurnaj  et  al.  [43]. 
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Coping  with  Selfishness 

For  many  decades,  researchers  have  realized  that  selfish  behavior  can  have  unde¬ 
sirable  consequences  and  have  proposed  numerous  methods  for  coping  with  it.  To 
mention  just  a  few  approaches  ignored  in  this  thesis,  there  have  been  significant 
recent  advances  in  controlling  selfish  users  in  communication  networks  via  pricing 
policies  (see  [6,  171,  174]  and  the  references  therein)  and  via  centralized  switch  ser¬ 
vice  disciplines  and  flow  control  protocols  (for  example,  see  Shenker  [170]  and  the 
references  therein  for  approaches  that  seek  “fair”  outcomes).  An  approach  to  coping 
with  selfishness  that  possesses  a  rich  history  and  that  has  led  to  an  abundance  of 
recent  work  is  mechanism  design — see  the  work  of  Nisan  and  Ronen  [136,  137,  155] 
for  recent  research  motivated  by  discrete  optimization  problems  and  for  pointers  to 
the  classical  literature.  Most  work  in  mechanism  design  is  concerned  with  applica¬ 
tions  that  are  simpler  than  the  problem  of  network  routing  (such  as  auctions),  and 
requires  a  notion  of  currency  to  employ  some  kind  of  side  payments  to  the  selfish 
players. 

A  detailed  survey  of  previous  research  on  these  topics  is  beyond  the  scope  of  this 
thesis.  We  will  only  attempt  to  describe  work  on  the  two  methods  of  coping  with 
selfishness  that  we  study:  network  design  and  Staekelberg  routing. 

Braess’s  Paradox  and  Network  Design 

Ever  since  Braess’s  Paradox  was  reported  [28,  128],  researchers  have  attempted  to 
solve  variants  of  the  network  design  problem  described  in  Subsection  1.3.2;  for  ex¬ 
ample,  the  early  work  of  Dafermos  and  Sparrow  [50]  alludes  to  such  a  problem, 
ffowever,  progress  appears  to  have  been  elusive,  both  computationally  and  theoret¬ 
ically  (see  Chapter  5  for  a  survey  of  past  efforts).  Prior  to  our  work,  the  network 
design  problem  discussed  in  Subsection  1.3.2  was  not  known  to  be  NP-hard,  nor 
was  any  heuristic  for  the  problem  known  to  have  a  finite  approximation  ratio.  In 
addition,  our  construction  of  an  infinite  family  of  networks  generalizing  Braess’s 
Paradox  in  both  size  and  severity  appears  to  be  new. 

Staekelberg  Routing 

Staekelberg  games  and  Staekelberg  equilibria  have  been  extensively  studied  in  the 
game  theory  literature  and  previously  applied  to  problems  in  both  networking  and 
other  fields  (see  Chapter  6  for  an  overview).  Closest  to  our  approach  are  the  papers 
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of  Douligeris  and  Mazumdar  [53]  and  Korilis  et  al.  [104],  which  advocate  system 
optimization  via  Stackelberg  strategies.  In  [53],  however,  only  experimental  results 
are  reported.  Korilis  et  al.  [104]  seek  necessary  and  sufficient  conditions  for  the 
existence  of  a  leader  strategy  inducing  an  optimal  routing  of  all  of  the  traffic;  by 
contrast,  we  are  interested  in  worst-case  performance  guarantees. 

1.5  Tips  for  Reading  this  Thesis 

1.5.1  Prerequisites 

This  dissertation  assumes  relatively  few  prerequisites.  Foremost,  we  expect  the 
reader  to  be  comfortable  with  basic  concepts  of  network  flow  theory,  such  as  flows, 
cuts,  and  path  decompositions.  Our  favorite  reference  for  this  material  is  Tar- 
jan  [179].  On  occasion  we  assume  a  nodding  acquaintance  with  the  theory  of  NP- 
completeness,  for  which  standard  references  include  Garey  and  Johnson  [77]  and 
Papadimitriou  [141],  and  with  the  basics  of  linear  and  nonlinear  programming;  ac¬ 
cessible  introductions  to  these  two  Helds  are  given  by  Chvatal  [34]  and  Peressini 
et  al.  [145],  respectively.  We  assume  no  knowledge  of  game  theory;  however,  some 
of  our  definitions  and  results  may  appear  more  natural  to  the  reader  familiar  with 
basic  game-theoretic  concepts.  Standard  introductions  to  game  theory  include  Fu- 
denberg  and  Tirole  [75],  Osbourne  and  Rubinstein  [139],  and  Owen  [140].  For  a 
gentler  overview  ideal  for  a  long  plane  flight,  we  recommend  Straffin  [177]. 

1.5.2  Presentation  Overview 

Chapter  2  is  devoted  to  technical  preliminaries.  In  that  chapter  we  formally  define 
our  traffic  model,  we  define  flows  at  Nash  equilibrium  and  prove  many  of  their  basic 
properties,  and  we  study  the  optimization  problem  of  computing  the  minimum- 
latency  traffic  flow,  thereby  obtaining  a  useful  characterization  of  such  flows. 

Subsequent  chapters  split  naturally  into  two  parts:  in  Part  II  we  develop  tech¬ 
niques  for  bounding  the  price  of  anarchy  and  in  Part  III  we  design  and  analyze 
methods  for  coping  with  selfishness.  More  specifically,  in  Chapter  3  we  develop 
techniques  for  computing  the  price  of  anarchy,  prove  our  bicriteria  bound  on  the 
inefficiency  of  Nash  equilibria  in  networks  with  arbitrary  latency  functions,  and 
show  that  the  price  of  anarchy  is  independent  of  the  network  topology  (see  Sub¬ 
section  1.3.1  for  a  more  detailed  overview).  In  Chapter  4  we  extend  some  of  these 
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results  to  more  general  traffic  models  and  to  more  general  types  of  games.  Chapter  5 
studies  the  problem  of  designing  networks  for  selfish  users,  and  presents  all  of  the 
material  outlined  in  Subsection  1.3.2.  In  the  final  technical  chapter,  Chapter  6,  we 
define  a  model  of  Stackelberg  routing  and  prove  the  results  described  in  Subsec¬ 
tion  1.3.3.  Finally,  in  Chapter  7  we  conclude  with  a  discussion  of  recent  work  and 
some  suggestions  for  further  research. 

1.5.3  Dependencies 

Chapter  2  is  a  prerequisite  for  all  that  follows,  though  some  sections  are  required  only 
for  a  subset  of  our  results;  this  is  discussed  in  detail  in  the  chapter’s  introduction. 
Chapters  3,5,  and  6  can  be  read  independently  of  each  other,  although  in  Chapter  5 
we  assume  some  of  the  results  (but  not  the  proof  techniques)  of  Chapter  3.  Finally, 
Chapter  4  is  meant  to  be  read  following  Chapter  3. 

1.6  Bibliographic  Notes 

Most  of  the  work  reported  in  this  thesis  has  appeared  previously  in  research  pa¬ 
pers  [159,  160,  161,  162,  163,  164,  165].  Chapter  4  and  portions  of  Chapter  3  and 
Appendix  A  are  joint  work  with  Eva  Tardos  and  appeared  in  [164,  165];  the  rest  of 
Chapter  3  is  drawn  from  [160,  163].  The  results  of  Chapter  5  appeared  in  [159],  the 
results  of  Chapter  6  in  [161],  and  Theorem  A. 3.1  of  Section  A. 3  in  [162], 


Chapter  2 
Preliminaries 


In  this  chapter  we  present  the  basic  definitions  and  preliminary  technical  results 
needed  in  the  rest  of  this  work.  In  Section  2.1  we  formally  define  the  traffic  model 
discussed  in  Chapter  1.  In  Section  2.2  we  define  flows  at  Nash  equilibrium  and  prove 
some  of  their  basic  properties.  Section  2.3  gives  a  characterization  of  minimum- 
latency  flows  that  is  crucial  for  a  majority  of  our  results.  In  Section  2.4  we  illustrate 
the  definitions  and  propositions  of  Sections  2. 1-2.3  with  several  concrete  examples. 
In  Section  2.5  we  build  on  the  results  of  Section  2.3  to  prove  the  existence  and 
essential  uniqueness  of  flows  at  Nash  equilibrium,  and  in  Section  2.6  we  conclude  by 
proving  further  useful  properties  about  flows  at  Nash  equilibrium. 

Different  portions  of  this  dissertation  depend  on  different  subsets  of  this  chapter; 
these  dependencies  are  described  in  detail  in  Table  2.1. 

2.1  The  Model 

We  consider  a  directed  network  G  =  (V,  E)  with  vertex  set  V,  edge  set  E,  and  k 
source-destination  vertex  pairs  {si,  tfl\, . . . ,  {s*,,  £&}.  We  allow  parallel  edges  between 
vertices  but  have  no  use  for  self-loops.  We  will  sometimes  refer  to  vertices  as  nodes 
and  to  edges  as  links.  We  denote  the  set  of  (simple)  Si-fl  paths  by  Vi,  and  define 
V  =  U {Pi-  To  avoid  trivialities,  we  will  always  assume  that  V,  ^  0  for  each  i.  A 
flow  is  a  function  /  :  V  —■ >  1Z+ ;  for  a  fixed  flow  /  and  an  edge  e  G  E  we  define 
fe  =  J2p-.eeP  fp  t°  be  the  total  amount  of  flow  on  edge  e.  We  sometimes  refer  to 
a  source-destination  pair  {sifli}  and  the  Si-fl  paths  Vt  as  commodity  i.  When  we 
wish  to  concentrate  on  the  flow  of  a  particular  commodity  i,  we  write  /''  for  the 
restriction  of  /  to  V,  and  /*  for  J2pepi:eeP  fp- 
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Section 

Prerequisite  for. . . 

2.1 

rest  of  thesis 

2.2 

rest  of  thesis 

2.3 

Sections  2. 4-2. 5,  3. 2-3. 5,  4. 1-4.2,  4.4,  Chapter  6 

2.4 

none 

2.5 

rest  of  thesis/Section  A.l 

2.6 

Section  5.4 

Table  2.1:  What  in  Chapter  2  is  useful  where.  Section  2.4  is  not  logically  necessary 
for  what  follows,  but  Subsections  2.4.1,  2.4.2,  and  2.4.4  will  show  up  frequently  as 
examples  in  later  chapters.  The  proof  techniques  of  Section  2.5  are  needed  only  in 
Section  A.l,  but  we  will  use  the  fact  that  Nash  flows  exist  and  are  essentially  unique 
in  the  rest  of  the  thesis. 


We  associate  a  finite  and  positive  traffic  rate  rt  with  each  pair  {s*,  f},  the  amount 
of  flow  with  source  ,st  and  destination  f ;  a  flow  /  is  said  to  be  feasible  if  for  all  i, 
fp  =  Ti.  Finally,  each  edge  e  G  E  is  given  a  congestion-dependent  latency 
that  we  denote  by  £ef)-  For  each  edge  e  G  A,  we  assume  that  the  latency  function  £e 
is  nonnegative,  continuous,  and  nondecreasing.  The  latency  of  a  path  P  with  respect 
to  a  flow  f  is  defined  as  the  sum  of  the  latencies  of  the  edges  in  the  path,  denoted 
by  £p(f)  =  Eeep4(/e)'  We  will  call  the  triple  (G,r,  £)  an  instance. 

We  define  the  cost  C(f)  of  a  flow  /  as  the  total  latency  incurred  by  /,  i.e., 
C(f)  =  J2pep  £p(f)fp-  By  summing  over  the  edges  in  a  path  P  and  reversing  the 
order  of  summation,  we  may  also  write 

C(f)  =  E  We)fe- 

e£E 

With  respect  to  an  instance  (G,r,  £),  a  feasible  flow  minimizing  C(f)  is  said  to 
be  optimal  or  minimum-latency ;  such  a  flow  always  exists  because  the  space  of  all 
feasible  flows  is  a  compact  set  and  our  cost  function  is  continuous. 

Remark  2.1.1  We  will  see  in  the  coming  sections  that  our  assumptions  that  la¬ 
tency  functions  are  nonnegative,  continuous,  and  nondecreasing  are  essential  for 
our  theory  of  inefficiency  of  Nash  equilibria  in  networks.  We  believe  these  assump¬ 
tions  to  be  reasonable  for  most  network  applications.  Functions  that  violate  the 
continuity  assumption  (for  example,  step  functions)  can  be  made  continuous  (even 
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differentiable)  with  little  loss  of  information.  While  some  researchers  have  described 
scenarios  where  nonmonotone  cost  functions  are  natural1,  the  applications  we  have 
in  mind — routing  traffic  in  computer  and  road  networks — should  always  satisfy  the 
required  monotonicity  condition. 

2.2  Flows  at  Nash  Equilibrium 

We  wish  to  study  flows  that  represent  an  equilibrium  among  many  noncooperative 
network  users — that  is,  flows  that  behave  “greedily”  or  “selfishly”,  without  regard 
to  the  overall  cost.  We  intuitively  expect  each  unit  of  such  a  flow  (no  matter  how 
small)  to  travel  along  the  minimum-latency  path  available  to  it,  where  latency  is 
measured  with  respect  to  the  rest  of  the  flow;  otherwise,  this  flow  would  reroute 
itself  on  a  path  with  smaller  latency.  Following  Dafermos  and  Sparrow  [50],  we 
formalize  this  idea  in  the  next  definition. 

Definition  2.2.1  A  flow  /  feasible  for  instance  (G,r,£)  is  at  Nash  equilibrium  (or 
is  a  Nash  flow )  if  for  all  i  G  {1, . . . ,  /c},  Pi,  P2  G  P,  with  fPl  >  0,  and  <5  G  (0,  fPl], 
we  have  £pflf)  <  £p2(f),  where 

[  fp-S  if  P  =  Pi 

fp=l  fp  +  S  if  P  =  P2 

{  fP  ifP^{Pi,P2}. 

Letting  S  tend  to  0,  continuity  and  monotonicity  of  the  edge  latency  functions 
give  the  following  useful  characterization  of  a  flow  at  Nash  equilibrium,  occasionally 
called  a  Wardrop  equilibrium  [84]  or  Wardrop’s  Principle  [175,  176]  in  the  literature, 
due  to  an  influential  paper  of  Wardrop  [186]. 

Proposition  2.2.2  A  flow  f  feasible  for  instance  ( G,r,£ )  is  at  Nash  equilibrium  if 
and  only  if  for  every  i  G  {1, . . . ,  k}  and  Pi,  P2  G  Vi  with  fpx  >  0,  ^Pi(/)  <  ^p2(/)- 

Remark  2.2.3  While  Definition  2.2.1  still  makes  sense  without  assuming  continuity 
and  monotonicity  of  the  edge  latency  functions,  Proposition  2.2.2  fails  if  either  of 
these  hypotheses  is  omitted  (the  forward  direction  fails  in  the  absence  of  continuity 
and  the  reverse  direction  fails  in  the  absence  of  monotonicity — see  Section  B.l). 

^or  example,  Blonski  [22]  points  out  that  people  typically  have  nonmonotone  preferences  about 
the  congestion  in  a  restaurant  or  at  a  concert,  preferring  a  moderate  crowd  to  total  isolation  or  to 
being  packed  in  like  a  sardine. 
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Briefly,  Proposition  2.2.2  states  that,  in  a  flow  at  Nash  equilibrium,  all  flow 
travels  on  minimum-latency  paths.  In  particular,  if  /  is  at  Nash  equilibrium  then 
all  Si~ti  flow  paths  paths  to  which  /  assigns  a  positive  amount  of  flow)  have 
equal  latency,  say  Lflf).  We  can  thus  express  the  cost  C(f )  of  a  flow  /  at  Nash 
equilibrium  in  a  particularly  nice  form. 

Proposition  2.2.4  If  f  is  a  flow  at  Nash  equilibrium  for  instance  (G,  r,£),  then 

C(/)  =  !><(/)•> 

1=1 

Remark  2.2.5  For  the  next  two  sections  we  will  take  for  granted  that  Nash  flows 
exist  and  are  essentially  unique;  this  will  be  proved  in  Section  2.5. 

Remark  2.2.6  Our  definition  of  a  flow  at  Nash  equilibrium  corresponds  to  an 
equilibrium  in  which  each  network  user  chooses  a  single  path  of  the  network  (a 
pure  strategy ),  whereas  in  classical  game  theory  a  Nash  equilibrium  is  defined  via 
mixed  strategies  (with  players  of  a  game  choosing  probability  distributions  over  pure 
strategies).  However,  since  in  our  model  each  network  user  controls  only  a  negligible 
fraction  of  the  overall  traffic,  these  two  definitions  are  essentially  equivalent — see  [84] 
for  a  rigorous  discussion. 


2.3  A  Characterization  of  Optimal  Flows 

We  now  investigate  the  properties  of  an  optimal  flow — that  is,  of  a  flow  minimizing 
the  total  latency.  Recalling  that  the  cost  of  a  flow  /  may  be  expressed  C(f)  = 
JfeeE  ^e(fe)fe,  the  problem  of  hireling  a  minimum- latency  feasible  how  in  a  network 
is  a  special  case  of  the  nonlinear  program 

Min  Ce(fe) 

e£E 

subject  to: 

(NLP)  J2fp  =  G  Vie{l, ...,&} 

PdVi 

fe=  E  fp  VeGP 

P&V-.e&P 

fp>  0  VP  G  V 


where  in  our  problem,  ce(/e)  =  4 (fe)fe- 
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For  simplicity  we  have  given  a  formulation  with  an  exponential  number  of  vari¬ 
ables,  but  it  is  not  difficult  to  give  an  equivalent  compact  formulation  (with  decision 
variables  only  on  edges  and  explicit  node  conservation  constraints)  that  requires 
only  polynomially  many  variables  and  constraints. 

Next,  we  characterize  the  local  optima  of  (NLP).  Roughly,  we  expect  a  flow 
to  be  locally  optimal  if  and  only  if  moving  flow  from  one  path  to  another  can  only 
increase  the  flow’s  cost.  Put  differently,  we  expect  a  flow  to  be  locally  optimal 
when  the  marginal  benefit  of  decreasing  flow  along  any  Si-ti  flow  path  is  at  most 
the  marginal  cost  of  increasing  flow  along  any  other  s^ti  path.  Since  the  local  and 
global  minima  of  a  convex  function  on  a  convex  set  coincide  (see,  e.g.,  [145,  Thm 
2.3.4]),  this  condition  should  be  necessary  and  sufficient  for  a  flow  to  be  globally 
optimal  whenever  the  objective  function  of  (NLP)  is  convex.2  This  is  the  case 
when,  for  example,  for  each  edge  e  G  E  we  have  ce(fe)  =  £e(fe)fe  with  a  convex 
latency  function  ie. 

We  formalize  this  characterization  of  global  optima  of  convex  programs  of  the 
form  (NLP)  in  the  next  lemma.  For  a  differentiable  cost  function  ce,  let  c'e  denote 
the  derivative  f^-ce(x)  of  ce  and  define  c'P(f)  by  c'p(f)  =  J2e^p  c'e(fe).  We  then  have 
the  following.3 

Proposition  2.3.1  ([17,  50])  A  flow  f  is  optimal  for  a  convex  program  of  the  form 
(NLP)  with  differentiable  cost  functions  if  and  only  if  for  every  i  e  {1, . . . ,  k}  and 
Pi,P2  e  Vi  with  fPl  >0,  c'Pl(f)  <  c’pflf). 

The  striking  similarity  between  the  characterizations  of  optimal  solutions  to  a 
convex  program  of  the  form  (NLP)  and  of  flows  at  Nash  equilibrium  was  noticed 
early  on  by  Beckmann  et  al.  [17],  and  provides  an  interpretation  of  an  optimal  flow 
as  a  flow  at  Nash  equilibrium  with  respect  to  a  different  set  of  edge  latency  functions. 
To  make  this  relationship  precise,  denote  the  marginal  cost  of  increasing  flow  on  edge 
e  with  differentiable  latency  function  £e  by  £*e(x)  =  f^(y-£e(y))(x)  =  £e(x)  +  x  ■  £'e(x) . 
Propositions  2.2.2  and  2.3.1  then  yield  the  following  corollary. 

Corollary  2.3.2  ([17,  50])  Let  (G,  r,  £)  be  an  instance  with  differentiable  latency 
functions  in  which  x  ■  £e(x)  is  a  convex  function  for  each  edge  e,  with  marginal  cost 

2 A  function  /  defined  on  a  convex  subset  S  of  TZn  is  convex  if  /( \x  +  (1  —  A )y)  <  A f(x)  +  (1  — 
A )f(y)  for  all  x,  y  G  S  and  A  €  [0,1].  Some  authors  call  such  functions  weakly  convex. 

3For  a  formal  derivation  via  the  Karush-Kuhn-Tucker  Theorem  [145],  see  [17,  50]. 
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functions  £*  defined  as  above.  Then  a  flow  f  feasible  for  (G,  r,  £)  is  optimal  if  and 
only  if  it  is  at  Nash  equilibrium  for  the  instance  (G,r,£*). 

Remark  2.3.3  We  will  typically  denote  a  minimum-latency  flow  for  an  instance  by 
/*.  The  marginal  cost  functions  are  denoted  by  £*  since  they  are  “optimal  latency 
functions”  in  a  sense  made  precise  by  Corollary  2.3.2:  the  optimal  flow  f*  arises  as 
a  flow  at  Nash  equilibrium  with  respect  to  latency  functions  £*. 

Remark  2.3.4  The  function  £*e(x)  describing  the  marginal  cost  of  increasing  flow  on 
edge  e  has  one  term  £e(x )  capturing  the  per-unit  latency  incurred  by  the  additional 
flow  and  a  second  term  x-£'e{x)  accounting  for  the  increased  congestion  experienced 
by  the  flow  already  using  the  edge.  Essentially,  the  only  difference  between  an 
optimal  flow  and  a  flow  at  Nash  equilibrium  is  that  the  former  accounts  for  this 
“conscientious”  second  term  while  the  latter  disregards  it. 

The  conclusion  of  Proposition  2.3.1  is  false  without  the  convexity  hypothesis. 
Instances  to  which  Proposition  2.3.1  and  Corollary  2.3.2  apply  (those  with  latency 
functions  satisfying  the  aforementioned  convexity  assumption)  will  play  an  impor¬ 
tant  role  in  portions  of  this  dissertation  (in  particular,  in  Sections  3. 3-3. 5  and  in 
Chapter  6),  important  enough  to  warrant  further  terminology. 

Definition  2.3.5  A  latency  function  £  is  standard  if  it  is  differentiable  and  if  x-£(x) 
is  convex  on  [0,  oo). 

Most  but  not  all  latency  functions  of  interest  are  standard.  All  differentiable 
convex  functions  are  standard,  as  are  some  well-behaved  nonconvex  functions  such 
as  log(l  -t-rc).  Differentiable  approximations  of  step  functions  are  the  most  notable 
examples  of  nonstandard  latency  functions. 

We  conclude  this  section  with  a  final  fact  about  networks  with  standard  latency 
functions,  useful  in  Chapter  6:  since  (NLP)  is  a  convex  program  when  all  edge 
latency  functions  are  standard,  the  optimal  flow  of  such  an  instance  can  be  found 
efficiently. 

Fact  2.3.6  If(G,  r,£ )  is  an  instance  with  standard  latency  functions,  then  the  opti¬ 
mal  flow  for  (G,  r,£)  can  be  computed  in  polynomial  time  (up  to  an  arbitrarily  small 
additive  constant). 

This  algorithmic  task  can  be  accomplished  with  the  ellipsoid  method,  even  when 
network  latency  functions  are  not  described  by  explicit  formulae  and  are  instead 
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given  only  as  oracles  [81].  Under  additional  assumptions  on  the  latency  functions 
(e.g.,  that  latency  functions  are  sufficiently  smooth  with  efficiently  computable  first 
and  second  derivatives),  standard  interior-point  techniques  (as  described  in,  for  ex¬ 
ample,  Renegar  [152])  can  be  used.  In  the  special  case  of  an  instance  with  a  single 
commodity,  the  problem  reduces  to  that  of  computing  a  min-cost  flow  with  re¬ 
spect  to  a  convex  separable  objective  function,  a  problem  for  which  combinatorial 
polynomial-time  algorithms  are  known — see  [3,  19,  87]  for  a  survey  of  the  available 
techniques. 

Remark  2.3.7  The  additive  error  in  Fact  2.3.6  is  required,  as  an  exact  description 
of  the  optimal  flow  may  require  irrational  numbers. 

2.4  Examples 

In  this  section  we  illustrate  the  definitions  and  characterizations  of  the  previous 
sections  in  some  concrete  networks,  and  hope  to  develop  the  reader’s  intuition  about 
Nash  and  optimal  flows.  We  first  return  to  the  familiar  examples  of  Subsection  1.2 
(Pigou’s  example  and  Braess’s  Paradox)  and  then  present  three  more  examples  that 
demonstrate  further  differences  between  Nash  and  optimal  flows. 

Remark  2.4.1  For  simplicity,  we  have  chosen  examples  in  which  all  traffic  shares 
the  same  source  and  destination.  However,  we  are  also  interested  in  (and  most  of 
our  results  will  apply  to)  networks  in  which  different  users  have  different  sources 
and  destinations. 

2.4.1  Pigou’s  Example 

Recall  that,  in  Pigou’s  example  of  Subsection  1.2.1,  we  have  a  network  with  two 
nodes  s  and  t,  two  parallel  edges  with  latency  functions  £{x)  —  1  and  £{x)  =  x,  and 
a  traffic  rate  of  1  (see  Figure  2.1(a)).  Routing  all  flow  on  the  bottom  link  equalizes 
the  latencies  of  the  two  available  s-t  paths  at  1,  and  thus  by  Proposition  2.2.2 
provides  a  flow  /  at  Nash  equilibrium.  By  Proposition  2.2.4  (or  by  inspection),  the 
cost  C(f)  of  /  is  1. 

Next,  notice  that  the  marginal  cost  functions  of  the  network  are  £*{x)  =  1  and 
£*(x)  =  2x  (see  Figure  2.1(b)).  Routing  half  of  the  traffic  on  each  link  thus  equalizes 
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(a)  Latency  Functions  (b)  Marginal  Cost  Functions 

Figure  2.1:  Pigou’s  example  revisited 


(a)  Latency  Functions  (b)  Marginal  Cost  Functions 

Figure  2.2:  Braess’s  Paradox  revisited 

the  marginal  costs  of  the  two  s-t  paths  at  1,  and  so  by  Corollary  2.3.2  furnishes  a 
minimum-latency  flow  /*.  The  cost  of  f*  is  C(f*)  —  ^  ^  |  •  1  =  f. 

2.4.2  Braess’s  Paradox 

Next  we  consider  the  network  of  Braess’s  Paradox  (Subsection  1.2.2)  after  the  ad¬ 
dition  of  the  zero-latency  edge;  see  Figure  2.2(a).  Setting  the  traffic  rate  r  to  1,  we 
see  that  the  flow  /  that  routes  all  traffic  on  the  path  s  — >  v  — >  w  — >  t  equalizes  the 
latency  of  the  three  s-t  paths  at  2,  and  thus  (by  Proposition  2.2.2)  /  is  at  Nash  equi¬ 
librium  with  C(f)  =  2.  Switching  to  marginal  cost  functions  (see  Figure  2.2(b)), 
we  find  that  the  flow  /*  that  routes  half  the  traffic  on  each  of  the  two  two-hop 
paths  equalizes  the  marginal  costs  of  the  three  s-t  paths  at  2,  and  is  therefore  (by 
Corollary  2.3.2)  optimal.  The  cost  C(f*)  of  f*  is  |. 
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Figure  2.3:  The  Nash  flow  may  be  strictly  Pareto-dominated  by  the  optimal  flow  in 
the  absence  of  Braess’s  Paradox. 

2.4.3  Strict  Pareto  Suboptimality  of  Nash  without  Paradox 

In  Subsection  1.2.2  we  remarked  that  Braess’s  Paradox  demonstrates  two  different 
principles.  First,  all  traffic  may  strictly  prefer  a  coordinated  outcome  to  the  flow 
at  Nash  equilibrium4;  second,  if  network  users  route  selfishly,  then  augmenting  a 
network  with  an  additional  link  may  strictly  increase  everyone’s  latency.  While  the 
second  phenomenon  implies  the  first  in  the  augmented  network  (by  coordinating, 
additional  links  can  always  be  ignored),  the  converse  is  not  true.  Put  differently, 
there  are  networks  in  which  the  flow  at  Nash  equilibrium  is  strictly  Pareto-dominated 
by  an  optimal  flow  and  yet  cannot  be  improved  by  the  deletion  of  any  number  of 
network  links. 

To  see  this,  consider  the  network  shown  in  Figure  2.3,  two  copies  of  the  network 
of  Pigou’s  example  glued  together  in  series.  Setting  the  traffic  rate  r  to  be  1,  the 
flow  /  that  routes  all  traffic  along  the  two  bottom  links  equalizes  the  latency  of 
all  four  s-t  paths  at  2  and  is  thus  at  Nash  equilibrium.  On  the  other  hand,  in  the 
optimal  flow  f*  that  routes  half  of  the  traffic  on  the  path  comprising  the  top  link 
of  the  first  subnetwork  and  the  bottom  link  of  second  subnetwork  and  the  rest  of 
the  traffic  on  the  path  comprising  the  other  two  links,  all  traffic  experiences  only  | 
units  of  latency.  All  traffic  is  thus  better  off  in  the  flow  /*  than  in  the  Nash  flow  /. 
Moreover,  it  is  easy  to  check  that  the  Nash  flow  does  not  improve  when  any  subset 
of  the  links  of  the  network  is  removed. 

4In  the  language  of  economics,  we  would  say  that  the  Nash  flow  is  strictly  Pareto- dominated  by 
the  optimal  flow. 
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(a)  Latency  Functions  (b)  Marginal  Cost  Functions 

Figure  2.4:  A  nonlinear  variant  of  Pigou’s  example 

2.4.4  A  Nonlinear  Variant  of  Pigou’s  Example 

In  all  three  of  onr  examples  thus  far,  the  Nash  flow  fails  to  minimize  the  total  latency 
and  is  a  factor  of  precisely  |  more  costly  than  the  optimal  flow.  This  is  not  entirely 
a  coincidence,  as  in  the  next  chapter  we  will  see  that  no  worse  ratio  is  possible  in  any 
multicommodity  flow  network  provided  the  latency  of  every  edge  increases  linearly 
with  the  edge  congestion  (as  is  the  case  in  the  previous  three  examples).  We  now 
show  that  this  strong  hypothesis  on  the  network  latency  functions  is  necessary  for 
such  a  strong  result,  and  that  flows  at  Nash  equilibrium  can  be  arbitrarily  more 
costly  than  optimal  flows  in  networks  with  nonlinear  edge  latency  functions. 

Consider  the  minor  modification  of  Pigou’s  example  shown  in  Figure  2.4(a), 
where  we  have  replaced  the  latency  function  t(x)  =  x  by  the  highly  nonlinear  one 
£(x)  =  xp  (for  concreteness,  think  of  p  as  100  or  1000).  With  the  usual  traffic 
rate  of  1,  the  Nash  flow  /  is  the  same  as  in  Pigou’s  example;  all  flow  is  routed 
on  the  bottom  link  and  the  total  latency  is  1  (for  any  choice  of  p).  On  the  other 
hand,  the  discrepancy  between  the  latency  functions  (in  Figure  2.4(a))  and  the 
marginal  cost  functions  (in  Figure  2.4(b))  is  much  larger;  now,  the  flow  /*  that 
routes  ( p  +  l)~1//p  units  on  the  lower  link  and  the  remainder  on  the  upper  link 
equalizes  the  marginal  cost  of  the  two  links  at  1  and  is  thus  optimal.  The  cost 
C(f*)  of  f*  is  1  —  p-  (pT  l)-(h+1Fp,  which  tends  to  0  as  p  — ■»  oo.  Thus,  if  arbitrarily 
steep  latency  functions  are  allowed  (even  restricting  to  polynomials),  a  flow  at  Nash 
equilibrium  can  be  arbitrarily  more  costly  than  an  optimal  flow.  This  negative  result 
motivates  our  work  on  bicriteria  bounds  for  Nash  flows  in  networks  with  arbitrary 
latency  functions  (Section  3.6)  and  on  ensuring  that  the  cost  of  a  selfish  solution  in 
such  a  network  is  close  to  optimal  by  carefully  routing  a  small  fraction  of  the  traffic 
centrally  (Chapter  6). 
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(a)  Latency  Functions  (b)  Marginal  Cost  Functions 

Figure  2.5:  The  optimal  flow  may  sacrifice  some  traffic  to  a  path  with  large  latency 
to  minimize  the  total  latency. 

2.4.5  The  Unfairness  of  Optimal  Flows 

In  all  of  our  examples  thus  far,  the  optimal  flow  has  been  superior  to  the  Nash 
flow  in  a  very  strong  sense.  Rather  than  merely  achieving  a  smaller  total  latency 
than  a  Nash  flow,  in  the  previous  examples  all  traffic  is  at  least  as  well-off  in  the 
optimal  flow  as  in  the  flow  at  Nash  equilibrium;  that  is,  the  optimal  flow  has  Pareto- 
dominated  the  flow  at  Nash  equilibrium.  Our  next  example  shows  that  this  will  not 
always  be  the  case;  in  general,  an  optimal  flow  will  sacrifice  some  traffic  to  paths 
with  large  latency  in  order  to  minimize  the  total  latency  experienced  by  all  of  the 
traffic. 

Consider  the  network  of  Figure  2.5(a),  a  small  variation  on  Pigon’s  example  in 
which  we  replace  the  latency  function  i(x)  =  1  with  the  latency  function  i(x)  =  2  — e 
for  a  very  small  positive  constant  e  >  0.  As  usual,  we  set  the  traffic  rate  r  to  1.  In 
the  flow  at  Nash  equilibrium,  all  traffic  is  routed  on  the  bottom  link  and  experiences 
one  unit  of  latency.  In  the  optimal  flow,  however,  only  1  — e/2  units  of  flow  are  routed 
on  the  bottom  link — routing  the  other  e/2  units  of  traffic  on  the  upper  link  equalizes 
the  marginal  costs  of  the  two  edges  at  2  — e.  Intuitively,  a  small  fraction  of  the  traffic 
is  sacrificed  to  the  slow  edge  in  order  to  (slightly)  reduce  the  congestion  experienced 
by  the  overwhelming  majority  of  network  users. 

The  “unfairness”  of  optimal  flows  is  an  unfortunate  property.  There  is  good  news, 
however;  for  networks  in  which  all  traffic  shares  the  same  source  and  destination, 
we  can  quantify  this  unfairness  and  prove  that  it  cannot  be  too  large.  Since  this 
issue  is  somewhat  removed  from  the  main  themes  of  this  thesis,  we  defer  a  further 
discussion  of  it  to  Section  A. 3  of  Appendix  A. 
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2.5  Existence  and  Uniqueness  of  Nash  Flows 

In  this  section,  we  exploit  the  similarity  between  the  characterizations  of  Nash  and 
of  minimum- latency  flows  (Propositions  2.2.2  and  2.3.1)  to  prove  the  existence  and 
essential  uniqueness  of  flows  at  Nash  equilibrium.  This  result  is  originally  due  to 
Beckmann  et  al.  [17]  and  was  later  reproved  by  Dafermos  and  Sparrow  [50];  we 
include  a  proof  for  completeness. 

Proposition  2.5.1  ([17,  50])  An  instance  ( G ,  r,  £)  with  continuous,  nondecreasmg 
latency  functions  admits  a  flow  at  Nash  equilibrium.  Moreover,  if  /,  /  are  flows  at 
Nash  equilibrium,  then  £e(fe)  =  £e(fe)  for  each  edge  e. 

Proof.  Set  he(x )  =  ff  £e(t)dt.  By  continuity  of  the  latency  function  £e,  the  function 
he  is  differentiable  with  nondecreasing  derivative  ie  and  is  therefore  convex.  Now 
consider  the  convex  program 

Min  M/e) 

eS-E 

subject  to: 

(■ NLP2 )  'Efp  =  r* 

P&Vi 

fe=  £  fP 
P&V:e&P 
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Ve  G  P 
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and  observe  that  the  optimality  conditions  of  Proposition  2.3.1  are  identical  to  the 
characterization  of  flows  at  Nash  equilibrium  in  Proposition  2.2.2.  The  optimal 
solutions  for  (NLP 2)  are  thus  precisely  the  flows  at  Nash  equilibrium  for  (G,r,£). 
Existence  of  a  flow  at  Nash  equilibrium  for  ( G,r,£ )  then  follows  from  the  facts  that 
(. NLP2 )  has  a  continuous  objective  function  and  a  compact  feasible  region.  Next, 
suppose  /,  /  are  flows  at  Nash  equilibrium  for  (G,  r,  £)  (and  hence  global  optima  for 
(NLP 2)).  By  convexity  of  the  objective  function  of  (NLP2),  whenever  f  f  the 
objective  function  must  be  linear  between  these  two  values;  otherwise  any  convex 
combination  of  /,  /  would  be  a  feasible  solution  for  ( NLP2 )  with  smaller  objective 
function  value.  Since  the  objective  function  is  convex  separable,  he  must  be  linear 
between  fe  and  fe  for  each  edge  e.  By  continuity  of  each  latency  function  £e,  each  £e 
must  be  constant  between  fe  and  fe.  This  implies  that  £e(fe )  =  £e(fe )  f°r  all  e  G  E, 
and  the  proof  is  complete.  ■ 
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Remark  2.5.2 

(a)  The  proof  of  Proposition  2.5.1  shows  that  if  each  latency  function  te  is  strictly 
increasing,  then  the  flow  fe  on  edge  e  in  a  Nash  flow  is  uniquely  determined 
for  each  edge  e.  Even  in  this  setting  there  may  be  several  Nash  flows,  however, 
as  distinct  flows  (i.e.,  distinct  functions  on  the  paths  V)  may  induce  identical 
/e-v alues  on  the  edges. 

(b)  In  the  absence  of  strictly  increasing  latency  functions,  different  Nash  flows  may 
place  different  amounts  of  flow  on  the  same  edge — consider  the  trivial  example 
of  two  nodes  and  two  parallel  links  endowed  with  the  constant  latency  function 
£{x)  =  1  (every  flow  in  this  network  is  at  Nash  equilibrium).  Thus  in  some 
sense  the  uniqueness  statement  of  Proposition  2.5.1  is  the  strongest  possible. 

(c)  Both  the  existence  and  uniqueness  conclusions  of  Proposition  2.5.1  fail  if  the 
hypothesis  of  continuity  is  omitted.  This  fact  illustrates  a  technical  distinction 
between  our  traffic  routing  model  (with  infinitely  many  players)  and  the  classi¬ 
cal  theory  of  noncooperative  games  developed  by  Nash  [130]  where,  assuming 
only  finitely  many  players  and  mixed  strategies  but  arbitrary  cost  functions, 
a  Nash  equilibrium  always  exists.  In  addition,  if  latency  functions  are  allowed 
to  be  decreasing  or  nonmonotone,  the  uniqueness  conclusion  fails  (see  Sec¬ 
tion  B.l  for  counterexamples).  For  this  reason  and  others,  the  assumption  of 
continuous,  nondecreasing  network  latency  functions  is  crucial  for  our  work. 

(d)  The  proof  of  Proposition  2.5.1  shows  that  flows  at  Nash  equilibrium  are  pre¬ 
cisely  the  optimal  solutions  to  a  related  convex  program  defined  over  the  same 
feasible  region.  Thus,  under  mild  smoothness  conditions  on  the  network  la¬ 
tency  functions,  a  flow  at  Nash  equilibrium  can  be  computed  (up  to  an  ar¬ 
bitrarily  small  additive  constant)  in  polynomial  time;  see  Fact  2.3.6  and  the 
comments  thereafter  for  more  details. 

(e)  The  reader  familiar  with  noncooperative  game  theory  will  recognize  the  un¬ 
usual  ease  with  which  we  proved  the  existence  and  essential  uniqueness  of 
flows  at  Nash  equilibrium;  in  general  non- zero  sum  (matrix)  games,  establish¬ 
ing  the  existence  of  Nash  equilibria  (in  mixed  strategies)  requires  recourse  to 
a  nonconstructive  fixed-point  theorem  [130].  That  flows  at  Nash  equilibrium 
arise  as  the  optimum  solutions  to  a  well-behaved  optimization  problem  is  both 
useful  and  remarkable,  and  the  recent  study  of  congestion  games  and  potential 
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games  by  the  game  theory  community  can  be  viewed  as  an  ongoing  quest  for 
broad  classes  of  games  that  share  this  property.  We  will  have  more  to  say 
about  these  two  classes  of  games  in  Section  4.4. 

(f)  The  proof  of  Proposition  2.5.1  leads  to  a  nontrivial  “quick  and  dirty”  upper 
bound  on  the  price  of  anarchy.  This  bound  is  easy  to  apply  but  does  not 
in  general  give  the  best  possible  results  (unlike  the  techniques  that  we  will 
develop  in  Chapter  3);  for  this  reason,  we  defer  further  discussion  of  this 
bound  to  Section  A.l  in  Appendix  A. 

In  much  of  this  thesis,  we  will  be  content  with  the  following  corollary  of  Propo¬ 
sition  2.5.1,  which  states  that  all  flows  at  Nash  equilibrium  have  the  same  cost. 

Corollary  2.5.3  If  f,  /  are  flows  at  Nash  equilibrium  for  the  instance  (G,  r,  I),  then 
C(f)=C(f). 

Proof.  The  corollary  follows  directly  from  Propositions  2.2.4  and  2.5.1.  ■ 

2.6  Acyclicity  of  Nash  Flows 

The  goal  of  this  final  section  is  to  prove  that  every  instance  (G,r,£)  admits  a  Nash 
flow  without  flow  cycles  (thereby  strengthening  the  existence  guarantee  of  Proposi¬ 
tion  2.5.1).  Along  the  way,  we  will  prove  some  useful  properties  about  the  structure 
of  minimum-latency  paths  with  respect  to  a  Nash  flow. 

We  begin  with  an  extension  of  Proposition  2.2.2.  While  Proposition  2.2.2  char¬ 
acterizes  Nash  flows  as  those  with  all  Sj-C  flow  paths  having  minimum  latency  (for 
each  commodity  i),  the  following  lemma  gives  an  analogous  characterization  with 
Si  and  ti  replaced  by  an  arbitrary  pair  of  vertices. 

Proposition  2.6.1  Let  f  be  a  flow  feasible  for  the  instance  ( G,r,£ ).  For  a  vertex 
v  in  G  and  a  commodity  i,  let  dl(y )  denote  the  length,  with  respect  to  edge  lengths 
(■e(fe),  of  a  shortest  Si-v  path  in  G.  Then  f  is  at  Nash  equilibrium  if  and  only  if  for 
every  pair  v,  w  of  vertices  in  G,  every  commodity  i,  and  every  v-w  path  P: 

(a)  dl(w)  -  dflv)  <  E eep4(/e) 

(b)  if  fl>  0  for  every  edge  e  G  P,  then  dl[w)  —  dl(v)  =  EeeP^e(/e)- 
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Proof.  First  suppose  /  is  feasible  for  (G,  r,  £)  and  satisfies  the  two  conditions  of  the 
proposition.  For  a  commodity  i,  we  can  take  v  =  Si  and  w  —  tt  and  apply  properties 
(a)  and  (b)  to  find  that  every  Sj-t*  flow  path  of  /  has  minimum  latency  among  all 
Sj-tj  paths  (namely,  dl(t )).  Since  this  holds  for  all  commodities,  Proposition  2.2.2 
implies  that  /  is  at  Nash  equilibrium. 

Conversely,  suppose  /  is  at  Nash  equilibrium  for  (G,r,£).  ft  suffices  to  prove 
that  properties  (a)  and  (b)  hold  when  P  is  a  single  edge  (for  a  general  path,  sum 
up  the  inequalities  or  equalities  corresponding  to  the  constituent  edges).  Then,  (a) 
follows  by  definition  of  dl{v)  and  dl(w).  To  prove  (b),  consider  a  commodity  i  and 
an  edge  e  with  /*  >  0  and  suppose  for  contradiction  that  dl(w)  <  dl{v)  +  £e(fe). 
Let  Pe  denote  an  spti  path  containing  e  with  fpe  >  0.  We  may  obtain  another  Sj-tj 
path  P'  via  the  union  of  a  shortest  SpW  path  and  the  w-ti  path  contained  in  Pe. 
Since  the  latency  of  the  Si-w  path  contained  in  Pe  is  at  least  dl(v)  +  £e(fe )  >  dl(w), 
we  have  £pflf)  >  £p’(f );  by  Proposition  2.2.2,  this  contradicts  our  assumption  that 
/  is  at  Nash  equilibrium.  ■ 

ft  is  important  to  note  that  the  path  P  in  the  statement  of  Proposition  2.6.1 
does  not  need  to  be  a  subpath  of  any  flow  path  of  /;  in  particular,  in  property  (b) 
the  flow  on  different  edges  of  P  can  be  carried  by  distinct  flow  paths  of  /. 

By  a  flow  cycle  for  commodity  i  of  a  flow  /  we  mean  a  collection  C  of  edges 
in  the  underlying  graph  that  form  a  directed  cycle  and  for  which  /*  >  0  for  all 
e  G  C;  we  emphasize  that  flow  on  different  edges  may  be  carried  by  different  Si-ti 
flow  paths.  Call  a  flow  /  acyclic  if  it  has  no  flow  cycles  (for  any  commodity).  We 
now  prove  that  every  instance  admits  an  acyclic  Nash  flow,  a  fact  that  we  believe 
to  be  “folklore” . 

Proposition  2.6.2  An  instance  (G,  r,  £)  admits  an  acyclic  flow  at  Nash  equilibrium. 

Proof.  An  instance  (G,  r,  £)  admits  a  (not  necessarily  acyclic)  Nash  flow  /  by 
Proposition  2.5.1.  We  will  first  show  that  flow  cycles  must  comprise  only  zero- 
latency  edges,  and  will  then  show  how  to  remove  such  cycles. 

For  a  commodity  i,  define  the  ,s,rv  distance  dl{y)  of  a  vertex  v  with  respect  to 
the  flow  /  as  in  Proposition  2.6.1.  By  Proposition  2.6.1  and  nonnegativity  of  edge 
latencies,  if  edge  e  =  (v,  w )  carries  flow  then  dl(w)  >  dl(v).  Thus,  in  a  directed  cycle 
C  of  flow  edges  (i.e.,  of  edges  e  satisfying  /*  >  0),  all  vertices  of  C  have  equal  dl- 
values  and  hence  (again  by  Proposition  2.6.1)  all  edges  of  C  must  have  zero  latency 
with  respect  to  /. 
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We  next  wish  to  remove  zero-latency  flow  cycles  from  /;  this  is  not  entirely  trivial 
as  the  flow  on  different  edges  of  a  flow  cycle  may  be  carried  by  different  flow  paths 
(recall  /  is  defined  as  a  function  on  paths,  rather  than  on  edges).  We  extract  a  new 
feasible  flow  /  from  /  by  running  the  following  procedure  for  %  —  1,  2, . . . ,  k: 

(1)  view  fl  as  a  function  on  edges  with  /*  =  Y^PeVi-.eeP  fp 

(2)  repeatedly  discard  flow  cycles  from  fl  to  obtain  an  Si~U  flow  fl  (still  defined 
only  on  edges)  without  flow  cycles 

(3)  let  fl  be  an  arbitrary  path  decomposition  of  /*. 

The  reader  unfamiliar  with  path  decompositions  and  the  discarding  of  flow  cycles 
should  consult  any  text  on  network  flow,  such  as  Tarjan  [179]. 

The  flow  /  is  acyclic  by  construction  and  is  feasible  for  (G,  r,  £)  since  only  flow 
cycles  were  removed  from  the  feasible  flow  /;  it  remains  only  to  show  that  /  is  at 
Nash  equilibrium.  For  each  edge  e,  we  have  either  fe  =  fe  or  fe  <  fe  with  Ee(fe)  =  0 
(and  hence  £e(fe )  =0).  It  follows  that  £e(fe)  =  7e(/e)  for  every  edge  e,  which  in 
turn  implies  that  the  flows  /  and  /  induce  identical  cf -values  on  the  vertices  of  G 
for  every  commodity  i.  Appealing  to  the  characterization  of  Nash  flows  given  in 
Proposition  2.6.1,  that  /  is  a  Nash  flow  implies  that  /  is,  as  well.  ■ 


Part  II 

Bounding  the  Price  of  Anarchy 
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Chapter  3 

How  Bad  is  Selfish  Routing? 


3.1  Introduction 

In  this  chapter  we  quantify  the  inefficiency  of  Nash  equilibria  in  the  traffic  model 
described  in  the  previous  two  chapters.  Recall  from  our  previous  examples  that 
traffic  flows  at  Nash  equilibrium  (flows  in  which  no  network  user  has  an  incentive  to 
reroute  its  traffic)  do  not  in  general  minimize  the  total  latency,  our  measure  of  social 
welfare.  In  this  chapter  we  study  the  cost  of  routing  selfishly  via  the  following  ques¬ 
tion:  given  an  arbitrary  multicommodity  flow  network  with  congestion-dependent 
edge  latencies,  what  is  the  worst-possible  ratio  between  the  total  latency  of  a  flow  at 
Nash  equilibrium  and  that  of  the  best  coordinated  outcome — of  a  flow  minimizing 
the  total  latency? 

3.1.1  Summary  of  Results 

As  discussed  in  Subsection  1.3.1,  this  worst-case  ratio  (“the  price  of  anarchy”  [142]) 
depends  crucially  on  the  “steepness”  of  the  network  latency  functions.  We  will 
present  our  techniques  for  computing  the  price  of  anarchy  in  an  incremental  fash¬ 
ion,  with  each  section  considering  successively  more  general  classes  of  edge  latency 
functions. 

The  simplest  nontrivial  networks  are  those  with  linear  latency  functions  (where 
every  edge  latency  function  is  of  the  form  i(x)  =  ax  +  b  for  a,  b  >  0);  for  this  reason 
and  others,  such  networks  have  been  studied  extensively  in  the  past  [71,  72,  143, 
144,  175,  176].  Our  first  result  is  that  in  any  multicommodity  flow  network  with 
linear  latency  functions,  the  total  latency  of  a  flow  at  Nash  equilibrium  is  at  most 
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Description 

Typical  Representative 

Price  of  Anarchy 

Linear 

Quadratic 

Cubic 

Polynomials  of  degree  <  p 

M/M/1  Delay  Functions 

M/G/l  Delay  Functions 

ax  +  b 

ax 2  +  bx  +  c 

ax 3  +  bx2  +  ex  +  d 

E?=0  aix* 

(u  —  x)-1 

1  ,  x(l+cr2u2) 

u  2  u(u—x) 

|  «  1.333 

3^2  “  1626 

4ft  «  1.896 

(p+1)  Pp+T  _  p/  P  \ 
(p+1)  {/p+1— p  Mnp' 

1  j  |  /  umin  j 

2  V  y  l^min  Rmax  J 

See  Section  3.5 

Table  3.1:  The  price  of  anarchy  for  common  classes  of  edge  latency  functions.  Poly¬ 
nomial  coefficients  are  assumed  nonnegative.  The  parameters  u  and  a  are  the  ex¬ 
pectation  and  standard  deviation  of  the  associated  queue  service  rate  distribution. 
Rmax  denotes  the  maximum  allowable  amount  of  network  traffic,  and  umin  denotes 
the  minimum  allowable  edge  service  rate  (or  capacity). 


|  times  that  of  a  minimum-latency  flow  (with  a  matching  lower  bound  provided  by 
any  of  the  first  three  examples  of  Section  2.4).  We  also  demonstrate  how  this  result 
provides  a  quantitative  limit  on  the  extent  to  which  counterintuitive  phenomena 
can  occur  in  certain  physical  systems,  such  as  the  strings  and  springs  example  of 
Figure  1.3. 

The  assumption  of  linear  latency  functions  is  quite  restrictive;  we  next  demon¬ 
strate  how  to  enhance  the  techniques  developed  for  networks  with  linear  latency 
functions  to  compute  the  price  of  anarchy  with  respect  to  an  (almost)  arbitrary 
class  of  latency  functions.  We  apply  this  generalization  to  additional  classes  of 
latency  functions  that  are  well-studied  in  the  networking  and  queueing  theory  liter¬ 
ature;  in  particular,  all  of  the  results  shown  in  Table  3.1  follow  directly  from  these 
techniques. 

These  methods  also  show  that  the  underlying  network  topology  plays  no  role  in 
the  determination  of  the  price  of  anarchy.  Specifically,  we  show  that  under  weak 
hypotheses  on  the  class  of  allowable  latency  functions1 ,  the  worst-case  ratio  between 
the  total  latency  of  a  flow  at  Nash  equilibrium  and  that  of  a  minimum-latency  flow 
in  any  multicommodity  flow  network  is  achieved  by  a  single-commodity  instance  on 

1For  example,  it  suffices  for  the  class  to  satisfy  a  mild  convexity  assumption,  to  be  closed  under 
multiplication  by  positive  scalars,  and  to  possess  some  latency  function  that  is  positive  when 
evaluated  with  zero  congestion.  Almost  all  classes  of  latency  functions  previously  considered  in 
the  literature  meet  the  required  hypotheses. 
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a  set  of  parallel  links.  In  the  special  case  of  a  class  of  latency  functions  that  includes 
all  of  the  constant  functions,  we  prove  that  a  network  with  only  two  parallel  links 
suffices  to  achieve  the  worst-possible  ratio.  Informally,  these  results  imply  that 
the  inefficiency  inherent  in  a  flow  at  Nash  equilibrium  stems  from  the  inability  of 
selfish  users  to  discern  which  of  two  competing  routes  is  superior  and  not  from  the 
topological  complexity  arising  from  the  diverse  intersections  of  many  paths  belonging 
to  different  commodities. 

Finally,  we  employ  a  bicriteria  approach  to  bound  the  inefficiency  of  Nash  flows 
in  networks  with  arbitrary  latency  functions  (where  our  previous  work  shows  that 
Nash  flows  may  be  arbitrarily  more  costly  than  optimal  flows).  We  show  that  in  a 
multicommodity  flow  network  with  latency  functions  assumed  only  to  be  continuous 
and  nondecreasing,  the  total  latency  incurred  by  traffic  at  Nash  equilibrium  is  at 
most  that  of  a  minimum-latency  flow  forced  to  route  twice  as  much  traffic  between 
each  source-destination  pair.  We  show  that  this  result  also  has  the  following  alter¬ 
native  interpretation:  in  lieu  of  centralized  control,  the  price  of  routing  selfishly  can 
be  offset  by  a  moderate  increase  in  link  speed  (which  for  queueing  delay  functions 
can  be  effected  by  a  moderate  increase  in  the  network  capacity). 

3.1.2  Related  Work 

Traffic  Equilibria 

Unregulated  traffic  has  been  modeled  as  network  flow  with  all  flow  paths  between 
a  given  source-destination  pair  having  minimum  latency  since  the  1950’s  [17,  186] 
(though  as  mentioned  in  Chapter  1,  Pigou  [148]  and  Knight  [99]  informally  discussed 
a  similar  model  thirty  years  earlier).  Wardrop  [186]  formulated  the  notions  of  a  flow 
at  Nash  equilibrium  and  of  a  minimum- latency  flow,  and  suggested  that  these  were 
the  two  types  of  traffic  flows  deserving  special  study.  Beckmann  et  al.  [17],  observing 
that  a  flow  at  Nash  equilibrium  is  an  optimal  solution  to  a  related  convex  program 
(see  Section  2.3),  gave  existence  and  uniqueness  results  for  traffic  equilibria.  Dafer- 
mos  and  Sparrow  [50]  reproved  many  of  the  results  of  [17]  and  were  also  interested 
in  computing  flows  at  Nash  equilibrium  efficiently;  in  particular,  they  proposed  two 
iterative  algorithms  for  computing  Nash  flows  and  proved  convergence  results  for 
certain  networks. 

Since  these  early  works,  the  traffic  model  studied  in  this  dissertation  has  been 
generalized  in  many  different  directions.  In  the  transportation  science  literature, 
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this  model  has  been  generalized  to  allow  for  the  latency  of  an  edge  to  depend  on  the 
entire  traffic  pattern  (rather  than  merely  on  the  flow  using  that  edge),  to  include 
different  types  (or  “modes”)  of  traffic,  to  allow  clastic  (rather  than  fixed)  traffic 
rates,  and  so  on;  research  on  these  more  general  models  have  focused  on  establishing 
existence  and  uniqueness  of  traffic  equilibria  [1,  26,  32,  45,  46,  47,  66,  67,  68,  84,  85, 
102,  129,  173,  192],  on  designing  algorithms  to  compute  an  equilibrium  [1,  11,  18, 
45,  46,  65,  66,  67,  68,  69,  78,  115,  118,  129,  134,  135,  187,  192],  and  on  sensitivity 
analysis  [45,  49,  83,  190].  For  an  introduction  to  this  literature,  we  recommend 
the  survey  of  Florian  and  Hearn  [68]  and  the  book  of  Sheffi  [169].  More  recently, 
Nesterov  [131]  (see  also  Nesterov  and  De  Palma  [132])  has  proposed  an  interesting 
alternative  to  (rather  than  a  generalization  of)  the  traffic  model  considered  in  this 
thesis. 

In  the  networking  literature,  many  recent  papers  alter  the  model  considered  here 
by  relaxing  the  assumption  that  every  network  user  controls  a  negligible  fraction  of 
the  overall  traffic  (as  will  we  in  Sections  4.2  and  4.3);  rather,  a  finite  number  of 
users  each  control  a  positive  amount  of  flow.  Most  of  these  papers  allow  players 
to  split  their  flow  among  several  routes  but  disallow  randomization,  and  then  give 
necessary  and  sufficient  conditions  (on  the  network  topology,  the  amount  of  flow  that 
users  control,  and  on  the  edge  latency  functions)  for  the  existence  and  uniqueness 
of  (pure-strategy)  Nash  equilibria  [4,  8,  25,  59,  138]  and  for  convergence  to  a  Nash 
equilibrium  under  natural  models  of  user  behavior  [5,  138].  Some  of  these  results 
have  been  extended  to  networks  in  which  users  cannot  split  flow  and  must  instead 
route  all  of  their  traffic  on  a  single  path  [117]. 

Finally,  the  game  theory  community  has  generalized  the  traffic  model  studied 
here  to  broad  classes  of  games  that  need  not  take  place  in  a  network.  This  literature 
(which  we  will  review  in  Section  4.4,  when  we  study  a  related  class  of  games)  aims 
to  identify  games  that  enjoy  many  of  the  desirable  properties  possessed  by  traffic 
equilibria,  such  as  existence  of  Nash  equilibria  in  pure  strategies,  rather  than  to 
model  any  specific  type  of  application. 

The  Price  of  Anarchy 

In  contrast  to  the  previous  work  on  traffic  equilibria,  we  are  interested  in  quanti¬ 
fying  the  difference  in  social  welfare  between  equilibrium  and  optimal  traffic  flows. 
The  idea  of  bounding  the  inefficiency  of  Nash  equilibria  was  first  proposed  by  Kout- 
soupias  and  Papadimitriou  [108]  for  the  following  simple  load-balancing  model.  A 
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finite  number  of  users  share  a  collection  of  parallel  links,  and  each  user  chooses  a 
probability  distribution  on  the  set  of  links  (specifying  the  probability  that  the  user 
will  route  all  of  its  flow  on  a  given  link).  Each  user  wishes  to  minimize  the  expected 
congestion  it  will  experience,  while  the  global  objective  is  to  minimize  the  expected 
load  on  the  most  congested  edge;  the  worst-case  Nash  equilibrium  is  then  compared 
to  a  globally  optimal  choice  of  distributions.  Koutsoupias  and  Papadimitriou  [108] 
obtained  a  tight  analysis  of  this  worst-case  ratio  (which  they  call  the  coordination 
ratio)  in  two-node,  two-link  networks  and  partial  results  for  two-node  networks  with 
three  or  more  parallel  links;  tight  results  were  subsequently  found  for  parallel  net¬ 
works  with  any  number  of  links  by  Mavronicolas  and  Spirakis  [122]  (for  a  special 
case)  and  by  Czumaj  and  Vocking  [44]  (for  the  general  case).  More  recently,  the 
original  model  of  [108]  has  been  generalized  by  Czumaj  et  al.  [43]  (for  example,  to 
include  more  general  objective  functions),  who  also  prove  a  variety  of  facts  about 
the  coordination  ratio  in  this  more  general  model. 

3.1.3  Organization 

In  Sections  3. 2-3. 5  we  present  our  methods  for  computing  the  price  of  anarchy; 
these  sections  should  be  read  in  order.  In  Section  3.2  we  study  networks  with  linear 
latency  functions  and  prove  that  the  price  of  anarchy  for  such  networks  is  We 
also  demonstrate  the  connection  between  networks  with  linear  latency  functions  and 
the  networks  of  strings  and  springs  advertised  in  Subsection  1.3.1.  In  Sections  3.3 
and  3.4  we  generalize  these  techniques  to  show  that  the  price  of  anarchy  is  inde¬ 
pendent  of  the  network  topology,  and  in  Section  3.5  we  show  how  this  fact  permits 
computation  of  the  price  of  anarchy  with  respect  to  an  arbitrary  class  of  latency 
functions.  Section  3.5  also  illustrates  our  techniques  by  computing  the  price  of 
anarchy  for  important  classes  of  latency  functions  not  considered  earlier. 

Finally,  in  Section  3.6  we  prove  that  a  Nash  flow  in  a  network  with  arbitrary 
latency  functions  costs  no  more  than  an  optimal  flow  forced  to  route  twice  as  much 
traffic.  This  section  does  not  depend  on  earlier  results  of  this  chapter  and  can  be 
read  immediately  following  Section  2.2. 
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3.2  The  Price  of  Anarchy  with  Linear  Latency 
Functions 

In  this  section,  we  consider  the  scenario  where  the  latency  of  each  edge  e  is  linear  in 
the  edge  congestion — that  is,  where  for  each  edge  e  €  E,  £e(x)  =  aex  +  be  for  some 
ae,  be  >  0.  This  is  the  setting  in  which  Braess’s  paradox  was  originally  discovered  [28, 
128],  and  several  subsequent  papers  focused  entirely  on  this  model  [71,  72,  143,  144, 
175,  176].  In  addition,  linear  latency  functions  are  important  for  other  applications: 
we  will  see  later  in  this  section  that  the  mechanical  networks  of  strings  and  springs 
described  in  Subsection  1.3.1  can  be  modeled  as  traffic  networks  with  linear  latency 
functions,  and  Friedman  [74]  shows  how  linear  latency  functions  naturally  arise  in  a 
simple  model  of  selfish  users  transferring  hies  over  a  network  employing  a  congestion 
control  protocol  (such  as  TCP). 

We  have  already  seen  in  Subsections  2. 4. 1-2. 4. 3  three  examples  with  linear  la¬ 
tency  functions  for  which  the  ratio  of  the  cost  of  a  how  at  Nash  equilibrium  and 
the  cost  of  an  optimal  how  is  |.  Our  main  result  for  this  section  (Theorem  3.2.6) 
is  a  matching  upper  bound  for  networks  with  linear  latency  functions.  The  proof 
techniques  of  this  section  will  also  form  the  basis  for  our  subsequent  work  bounding 
the  price  of  anarchy  for  networks  with  nonlinear  latency  functions. 

To  make  these  statements  precise,  we  require  some  additional  notation.  For  an 
instance  (G,  r,  £)  admitting  an  optimal  how  /*  and  a  how  at  Nash  equilibrium  /,  we 
denote  the  ratio  by  p  =  p(G,r,£)',  this  ratio  is  well  defined  by  Corollary  2.5.3. 

We  begin  by  noting  that  the  propositions  of  Section  2.3  have  particularly  simple 
and  useful  forms  in  the  special  case  of  networks  with  linear  latency  functions.  First, 
the  total  latency  C(f)  of  a  how  /  is  given  by  C(f)  =  Jfeaefe  +befe]  since  ae  >  0  for 
all  e,  the  nonlinear  program  (NLP)  of  Section  2.3  is  a  convex  (quadratic)  program 
and  thus  Proposition  2.3.1  characterizes  its  optimal  solutions.  Also,  in  the  notation 
of  Section  2.3,  if  the  latency  function  £e  of  edge  e  is  £e(x)  =  aex  +  be,  then  the 
marginal  cost  function  £*e  of  e  is  simply  £*(x)  =  2 aex  +  be.  For  convenience,  we 
summarize  this  discussion  together  with  specialized  versions  of  Propositions  2.2.2 
and  2.3.1  in  the  following  lemma. 

Lemma  3.2.1  Let  ( G,r,£ )  be  an  instance  with  edge  latency  functions  £e(x)  =  aex  + 
be  for  each  e  G  E.  Then, 

(a)  a  feasible  flow  f  is  at  Nash  equilibrium,  for  (G,r,£)  if  and  only  if  for  each 
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commodity  i  and  P ,  P'  G  Vx  with  fp>  0, 

^  ]  aefe  T  be  5;  ^  ]  &ef e  T  be 

eSP  eSP' 

(b)  a  feasible  flow  f*  is  optimal  for  (G,r,£)  if  and  only  if  for  each  commodity  i 
and  P,  P1  G  Vt  with  ff  >  0, 

E  2aef*e  +  be  <  E  2a*f*e  +  be- 
eSP  e£P' 

As  an  aside,  we  note  that  Lemma  3.2.1  immediately  gives  a  simple  proof  of  the 
following  nontrivial  result  regarding  networks  in  which  the  latency  of  each  edge  is 
proportional  to  its  congestion;  this  result  is  implicit  in  the  work  of  Dafermos  and 
Sparrow  [50],  and  other  properties  of  this  special  case  have  been  investigated  in  the 
context  of  electrical  networks  [24,  39]. 

Corollary  3.2.2  Let  G  be  a  network  in  which  each  edge  latency  function  £e  is  of 
the  form  £e(x)  =  aex.  Then  for  any  rate  vector  r,  a  flow  feasible  for  ( G,r,£ )  is 
optimal  if  and  only  if  it  is  at  Nash  equilibrium. 

Proof.  A  feasible  flow  for  such  an  instance  satisfies  the  conditions  of  Lemma  3.2.1(a) 
if  and  only  if  it  satisfies  the  conditions  of  Lemma  3.2.1(b).  ■ 

A  second  corollary  of  Lemma  3.2.1  will  play  a  crucial  role  in  our  proof  of  the 
main  theorem  of  this  section. 

Lemma  3.2.3  Suppose  ( G ,  r,  £)  has  linear  latency  functions  and  f  is  a  flow  at  Nash 
equilibrium.  Then, 

(a)  the  flow  //2  is  optimal  for  (G,r/ 2,£) 

(b)  the  marginal  cost  of  increasing  the  flow  on  a  path  P  with  respect  to  f  / 2  equals 
the  latency  of  P  with  respect  to  f. 

Proof.  For  part  (a),  simply  note  that  if  /  satisfies  the  conditions  of  Lemma  3.2.1(a) 
for  (G,  r,  £),  then  // 2  satisfies  the  conditions  of  Lemma  3.2.1(b)  for  (G,r/ 2,£).  For 
the  second  part,  recall  that  if  edge  e  has  latency  function  £e(x)  =  aex  +  be  then  e 
has  marginal  cost  function  £*e(x)  =  2 aex  +  be.  Thus,  £*e(fe/ 2)  =  £e{fe)  for  each  edge 
e  and  hence  tP(f  / 2)  =  £p{f)  for  each  path  P.  u 
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An  outline  of  the  proof  of  the  main  theorem  is  as  follows.  It  will  be  useful  to 
think  about  creating  an  optimal  flow  for  the  instance  (G,  r,  £)  via  a  two-step  process: 
in  the  first  step,  a  flow  optimal  for  the  instance  (G,r/2,£)  is  sent  through  G  (which 
from  Lemma  3.2.3(a)  we  know  to  be  simply  half  of  a  Nash  flow  for  ( G ,  r,  £)),  and  in 
the  second  step  this  flow  is  augmented  to  one  optimal  for  (G,  r,  £).  It  is  important 
to  note  that  this  augmentation  may  increase  or  decrease  the  amount  of  flow  on  any 
given  edge — e.g.,  in  Braess’s  Paradox  (Figure  2.2)  the  Nash  flow  /  (and  hence  the 
flow  // 2)  makes  use  of  the  zero-latency  edge  (v,  w)  while  the  optimal  flow  eschews  it. 
We  will  show  that  the  first  flow  has  cost  at  least  \C(f)  and  that  the  augmentation 
has  cost  at  least  |G(/),  where  /  is  some  flow  at  Nash  equilibrium. 

We  will  see  in  the  proof  of  Theorem  3.2.6  that  the  first  lower  bound  follows 
easily  from  Lemma  3.2.3(a),  but  the  second  (for  the  cost  of  the  augmentation,  given 
that  the  first  flow  has  already  been  routed)  requires  more  work,  and  in  particular 
the  following  lemma.  Intuitively,  the  lemma  simply  claims  that  the  per-unit  cost 
of  increasing  the  amount  of  flow  through  a  network  is  at  least  the  marginal  cost  of 
increasing  flow  on  any  path  with  respect  to  the  current  optimal  flow. 

Lemma  3.2.4  Suppose  ( Gpr,£ )  is  an  instance  with  linear  latency  functions  for 
which  f*  is  an  optimal  flow.  Let  L*(f*)  be  the  minimum  marginal  cost  of  increasing 
flow  on  an  Si-ti  path  with  respect  to  f*.  Then  for  any  5  >  0,  a  feasible  flow  for  the 
instance  ( G ,  (1  +  5)r,£)  has  cost  at  least 

ctn+sE  £*(/>•• 

i=  1 

Proof.  A  heuristic  proof  of  this  lemma  is  as  follows.  Since  the  marginal  cost  of 
increasing  flow  on  any  Sj-t*  path  with  respect  to  f*  is  at  least  L*(f*),  routing  Srt 
additional  units  of  flow  from  ,st  to  f  should  cost  at  least  SL*(f*)ri.  Summing  over  all 
commodities  i  should  then  yield  the  lemma.  This  argument  provides  good  intuition 
for  why  the  lemma  is  true,  but  is  not  sufficient  because  not  all  feasible  flows  for 
(G,  (1  +  S)r,  £)  are  obtainable  from  f*  by  simply  routing  additional  flow  through  the 
network. 

To  prove  the  lemma,  fix  5  >  0  and  suppose  /  is  feasible  for  (G,  (1  +  5)r,  £).  In 
general  fe  may  be  larger  or  smaller  than  f* .  For  any  edge  e  £  £,  convexity  of  the 
function  x  ■  £e{x)  =  aex2  +  bex  implies  that 


4(/e)/e  >  4 (4)4  +  (fe  4)4(4 
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In  essence,  this  inequality  states  that  estimating  the  cost  of  changing  the  flow  value 
on  edge  e  from  f*  to  fe  by  (/e  —  /*)£*(/*)  (i.e.,  by  the  marginal  cost  of  flow  increase 
at  f*  times  the  size  of  the  perturbation)  only  underestimates  the  actual  cost  of  an 
increase  (when  fe  >  /*)  and  overestimates  the  actual  benefit  of  a  decrease  (when 
fe  <  /*).  We  may  thus  derive 

C(f)  =  E  We)fe 

e&E 

>  Ew.')/:+E(/.-/.w) 

eS-B  eSB 

=  c(/*)  +  e  E  W)(fp~fp)- 

i= i  PeP; 

Since  we  have  £*P(f*)  >  L*(/*)  for  each  i  and  each  P  e  P,  and  equality  holding 
whenever  /p  >  0  (see  Lemma  3.2.1(b)),  we  obtain 

c(f)  >  c(/-)  +  EP(/*)  E(P-P) 

i=i  PePi 

=  c(/')+«Ei;(f)r,1 

completing  the  proof.  ■ 

Remark  3.2.5  Lemma  3.2.4  and  its  proof  remain  valid  in  much  more  general  set¬ 
tings;  all  that  is  required  is  convexity  of  the  function  x  ■  £e(x)  for  each  edge  e  (i.e., 
that  each  latency  function  is  standard — recall  Definition  2.3.5). 

We  are  now  prepared  to  prove  the  main  theorem. 

Theorem  3.2.6  If  (G,  r,£)  has  linear  latency  functions,  then  p(G,  r,£ )  <  |. 

Proof.  Let  /  be  a  flow  at  Nash  equilibrium  for  ( G,r,£ ).  Let  L*(/)  be  the  la¬ 
tency  of  an  set.,  flow  path,  so  that  C(f )  =  J2iLi(f)ri  (see  Proposition  2.2.4).  By 
Lemma  3.2.3(a),  // 2  is  an  optimal  flow  for  the  instance  (G,r/ 2,£).  Moreover,  by 
Lemma  3.2.3(b),  L*(f  / 2)  =  L*(/)  f°r  each  * — in  words,  marginal  costs  with  respect 
to  //2  and  latencies  with  respect  to  /  coincide.  This  establishes  the  necessary  con¬ 
nection  between  the  cost  of  augmenting  // 2  to  a  flow  feasible  for  (G,  r,£)  and  the 
cost  of  a  flow  at  Nash  equilibrium. 

Taking  6  =  1  in  Lemma  3.2.4,  we  find  that  the  cost  of  any  flow  f*  feasible  for 
(G,  r,£)  satisfies 

k 

C(n  >  C(//2)  +  EP(//2)^ 

i= 1  “ 
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1  fc 

=  C(//2)  +  - 5;  £<(/>< 

i—  1 

=  C(//2)  +  ic(/). 

Finally,  it’s  easy  to  lower  bound  the  cost  of  // 2: 

C(//2)  =  Ejae/«2  +  i« 

^  e 

=  jc</) 

and  thus  C(f*)  >  | C(f).  m 

We  note  that  the  analysis  of  this  section  can  easily  be  extended  to  prove  that  in 
an  instance  (G,r,£)  where  for  some  p,  £e(x)  =  aexp  +  be  (with  ae,be  >  0)  for  each 
edge  e,  p(G,r,£ )  <  (1  —  p  ■  (p  +  1)^(p+1)/p)_1  =  0(JL).  The  nonlinear  variant  of 
Pigou’s  example  (Subsection  2.4.4)  shows  that  this  result  is  tight.  In  Section  3.5  we 
will  see  that  this  upper  bound  holds  more  generally  for  instances  with  polynomial 
latency  functions  with  nonnegative  coefficients  and  any  number  of  terms  with  degree 
at  most  p. 

Consequences  for  Strings  and  Springs 

We  now  return  to  the  mechanical  networks  of  strings  and  springs  discovered  by  Co¬ 
hen  and  Horowitz  [36]  and  discussed  in  Subsection  1.3.1  and  Figure  1.3.  Viewing  the 
support  as  a  source  and  the  suspended  weight  as  a  destination,  with  each  string  and 
spring  as  an  edge,  the  equilibrium  position  of  the  mechanical  device  can  be  modeled 
as  a  flow  at  Nash  equilibrium  in  a  traffic  network  G,  with  force  corresponding  to 
flow  and  support- weight  distance  corresponding  to  the  common  latency  of  every 
source-destination  flow  path.  Strings  (as  perfectly  inelastic  objects)  are  modeled  as 
links  with  constant  latency  functions  while  (perfectly  clastic)  springs  correspond  to 
links  with  latency  functions  that  include  a  term  of  the  form  ax.  Severing  a  string 
or  spring  corresponds  to  deleting  an  edge  from  a  traffic  network;  thus  any  realizable 
equilibrium  of  the  mechanical  network  (after  possibly  destroying  some  of  its  con¬ 
stituent  parts)  corresponds  to  a  Nash  equilibrium  in  a  subgraph  of  the  corresponding 
traffic  network  G. 
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Although  Theorem  3.2.6  is  concerned  with  the  total  latency  of  flows  (a  concept 
with  no  natural  analogue  in  our  mechanical  networks),  we  can  use  the  result  in  the 
following  way.  By  Theorem  3.2.6,  every  traffic  flow  in  G  (and  in  particular  every 
flow  at  Nash  equilibrium  in  a  subgraph  of  G)  has  total  latency  at  least  |  times  that 
of  a  Nash  flow  /  in  G.  By  Proposition  2.2.4,  it  follows  that  if  the  common  latency 
of  every  flow  path  of  /  is  L  and  /*  is  a  flow  at  Nash  equilibrium  in  a  subgraph  of 
G,  then  the  common  latency  of  every  flow  path  of  f*  is  at  least  |L.  Reinterpreting 
this  result  for  networks  of  strings  and  springs,  we  obtain  the  following  corollary  of 
Theorem  3.2.6. 

Corollary  3.2.7  In  any  network  of  strings  and  springs  carrying  a  single  weight  with 
support- weight  distance  D,  the  sup  port- weight  distance  after  severing  an  arbitrary 
collection  of  strings  and  springs  is  at  least  jD. 

Cohen  and  Horowitz  [36]  also  showed,  by  an  analogous  construction,  that  re¬ 
moving  a  diode  from  a  two-terminal  electrical  network  of  resistors  and  diodes  can 
decrease  the  voltage  drop  from  source  to  ground — thus  removing  a  conducting  link 
can  increase  the  network  conductivity.  By  the  same  arguments  as  above,  Theo¬ 
rem  3.2.6  implies  that  the  voltage  drop  from  source  to  ground  in  such  an  electrical 
network  after  removing  any  number  of  resistors  and  diodes  is  at  least  |  times  the 
voltage  drop  in  the  original  network. 

3.3  The  Price  of  Anarchy  with  Standard  Latency 
Functions 

The  goal  of  this  section  is  to  provide  an  upper  bound  on  the  worst-case  ratio  be¬ 
tween  the  cost  of  a  Nash  flow  and  of  an  optimal  flow  for  instances  with  nonlinear 
latency  functions.  As  we  have  seen  in  the  nonlinear  variant  of  Pigou’s  example 
(Subsection  2.4.4),  the  price  of  anarchy  depends  crucially  on  how  “steep”  the  allow¬ 
able  latency  functions  can  be,  and  one  may  therefore  ask  whether  any  meaningful 
upper  bound  is  possible  for  networks  with  arbitrary  latency  functions.  The  answer 
is  affirmative,  provided  that  the  upper  bound  is  a  function  of  the  class  of  allowable 
latency  functions. 

To  state  the  main  result  of  this  section  precisely,  recall  that  p(G,  r,  I)  denotes  the 
ratio  between  the  cost  of  a  Nash  and  of  an  optimal  flow  for  instance  (G,  r,  I).  We  will 
associate  a  real  number  a(£)  >  1  to  each  class  £  of  allowable  edge  latency  functions 


47 


that  quantifies  the  “steepness”  of  the  latency  functions  in  £,  and  will  then  prove 
that  for  any  instance  (G,  r,  £)  with  latency  functions  in  the  class  £,  p(G,  r,  £)  <  a(C). 
In  Section  3.4  we  will  provide  a  matching  lower  bound,  by  exhibiting  (for  any  class 
£)  instances  with  latency  functions  in  £  and  p- value  arbitrarily  close  to  a(£). 

3.3.1  The  Anarchy  Value 

Our  first  task  is  to  find  a  definition  that  captures  how  “steep”  a  given  class  of  al¬ 
lowable  latency  functions  is.  A  first  attempt  attempt  might  involve  the  first  several 
derivatives  of  the  latency  functions;  for  example,  we  might  hope  to  prove  that  if  the 
first  few  derivatives  of  all  allowable  latency  functions  are  everywhere  bounded  by 
some  universal  constant,  then  the  price  of  anarchy  is  constant.  However,  any  in¬ 
stance  ( G ,  r,  £)  with  latency  functions  of  class  Ck  (latency  functions  that  are  k  times 
continuously  differentiable)  can  be  “scaled  down”  to  an  instance  (G,  r,  j^£)  in  which 
the  first  k  derivatives  of  all  edge  latency  functions  are  as  small  as  desired  (by  taking 
M  sufficiently  large).  Moreover,  p(G,r,  i )  =  p(G,r,£ )  since  a  feasible  flow  is  opti¬ 
mal  (respectively,  Nash)  for  (G,r,  -g£)  if  and  only  if  is  optimal  (respectively,  Nash) 
for  ( G,r,£ ).  Thus,  networks  with  latency  functions  with  (any  number  of)  bounded 
derivatives  are  not  “better-behaved”  than  networks  with  polynomial  latency  func¬ 
tions;  and  from  the  nonlinear  variant  of  Pigou’s  example  (Subsection  2.4.4),  we 
already  know  that  networks  with  polynomial  latency  functions  and  no  upper  bound 
on  the  allowable  degree  are  not  well-behaved  at  all. 

Before  giving  our  definition  capturing  how  “steep”  a  given  class  of  allowable  la¬ 
tency  functions  is  (which,  admittedly,  is  not  immediately  intuitive),  we  will  consider 
a  motivating  example.  It  will  be  convenient  to  apply  Corollary  2.3.2  to  compute  the 
optimal  flow  in  this  example;  for  this  reason  and  others  that  will  become  clear  later 
in  this  section,  we  restrict  ourselves  to  networks  with  standard  latency  functions  (see 
Definition  2.3.5).  Since  the  focus  of  this  section  is  on  classes  of  allowable  latency 
functions,  we  make  another  definition. 

Definition  3.3.1  A  class  £  of  latency  functions  is  standard  if  £  contains  a  nonzero 
function  and  if  each  £  e  £  is  standard. 

We  now  introduce  the  motivating  example.  Suppose  we  are  given  a  standard 
class  £  of  allowable  latency  functions,  and  wish  to  construct  a  network  in  which  the 
Nash  flow  incurs  much  more  latency  than  the  optimal  flow.  A  natural  idea  is  to 
mimic  the  bad  example  of  Subsection  2.4.4  as  best  we  can,  given  that  £  is  the  class 
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of  latency  functions  that  we  are  allowed  to  work  with.  For  simplicity,  assume  that 
the  constant  function  £\{x)  =  1  lies  in  C.  Then,  we  can  consider  the  usual  two-node, 
two-link  network,  assign  the  first  link  the  latency  function  t\  and  the  second  link 
the  “steepest”  latency  function  that  we  can  find.  More  formally,  suppose  £2  £  £ 
is  assigned  to  the  second  link  where  £2  satisfies  f2(0)  <  1  and  £2(T)  >  1  for  x 
sufficiently  large.  Choosing  r  >  0  to  satisfy  £2(r)  =  1,  we  find  that  a  Nash  flow 
with  traffic  rate  r  routes  all  of  its  flow  on  the  second  edge  for  a  total  latency  of 
r.  Using  Corollary  2.3.2  and  letting  A  G  (0, 1)  satisfy  Uj(Ar)  =  1,  we  find  that  the 
optimal  flow  routes  A r  units  of  flow  on  the  second  link  and  (1  —  A )r  units  of  flow 
on  the  first  link,  for  a  total  latency  of  ArT2(Ar)  +  (1  —  A)r.  Letting  /j  G  [0, 1]  denote 
£2(A r),  the  ratio  between  the  total  latency  of  the  Nash  flow  and  of  the  optimal  flow 
is  [A /i  +  (1  —  A)]-1.  Taking  into  account  that  this  argument  can  be  used  with  £\ 
replaced  by  any  constant  function,  and  that  £2  G  C  was  chosen  arbitrarily,  we  arrive 
at  the  following  definition. 

Definition  3.3.2  Let  £  be  a  nonzero  standard  latency  function.  The  anarchy  value 
a(£)  of  £  is 

a(£)  —  sup  [X/a  +  (1  —  A)]-1 

r> 0  :  £(r)> 0 

where  A  G  (0, 1)  satisfies  £*(Xr)  =  £{r)  and  //  G  [0, 1]  is  defined  by  /j  =  £{X r)/£(r). 

That  the  scalar  A  G  (0, 1)  exists  follows  from  the  Intermediate  Value  Theorem  and 
the  fact  that  £*(0)  =  £(0)  <  £{r)  <  £*{r).  In  most  cases  of  interest  A  will  be  uniquely 
determined  by  £  and  r;  otherwise,  our  assumption  that  £  is  standard  ensures  that 
the  anarchy  value  is  well  defined  (i.e.,  that  [X/a  +  (1  —  A)]-1  is  independent  of  the 
choice  of  A  satisfying  £*(X r)  =  £(r)). 

The  anarchy  value  of  a  latency  function  £  should  be  interpreted  as  the  worst 
possible  ratio  between  the  cost  of  a  Nash  flow  and  of  an  optimal  flow  in  a  two-node, 
two-link  network  where  one  edge  possesses  latency  function  £  and  the  other  possesses 
a  constant  latency  function;  the  worst-case  is  taken  over  choices  of  the  constant  and 
of  the  traffic  rate. 

Since  we  are  interested  only  in  the  “steepest”  latency  functions  of  a  class,  the 
next  definition  should  be  unsurprising. 

Definition  3.3.3  The  anarchy  value  a(C)  of  a  standard  class  C  of  latency  functions 

is 

a(C)  =  sup  a(£). 
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Remark  3.3.4 

(a)  The  anarchy  value  of  a  class  lies  in  [1,  oo]  and  need  not  be  finite. 

(b)  The  anarchy  value  seems  a  fearsome  expression  to  compute  analytically,  but  we 
will  see  in  Section  3.5  that  it  can  typically  be  worked  out  in  cases  of  practical 
interest . 

(c)  There  are  also  simpler  definitions  of  “steepness”  that  provide  nontrivial  but 
suboptimal  upper  bounds  on  the  price  of  anarchy;  see  Sections  A.l  and  A. 2  of 
Appendix  A. 

We  have  already  argued  informally  that  if  £  is  a  standard  class  of  latency  func¬ 
tions  containing  the  constant  functions,  then  there  are  instances  X  on  a  network  with 
two  nodes  and  two  links  and  latency  functions  in  £  with  ratio  p  arbitrarily  close  to 
a{£).  On  the  other  hand,  there  is  no  reason  a  priori  to  expect  the  anarchy  value  to 
have  any  connection  to  instances  defined  on  more  general  networks  (even  to  those 
defined  on  parallel  networks  with  more  than  two  links).  The  central  result  of  this 
section  is  that,  under  very  weak  conditions  on  the  class  of  allowable  latency  func¬ 
tions,  a{£)  upper  bounds  p(G,r,£)  for  any  instance  ( G,r,£ )  with  latency  functions 
in  £  (with  an  arbitrary  network  topology  and  an  arbitrary  number  of  commodities). 

3.3.2  Proof  Approach 

We  next  discuss  our  proof  approach.  At  the  highest  level,  the  proof  of  this  section 
is  inspired  by  that  of  the  last  section,  which  shows  that  the  price  of  anarchy  for 
networks  with  linear  latency  functions  is  precisely  |.  Let  us  recapitulate  the  three 
main  steps  of  that  proof.  First,  we  used  the  characterizations  of  Nash  and  optimal 
flows  (via  Corollary  2.3.2)  to  show  that  if  /  is  a  flow  at  Nash  equilibrium  for  an 
instance  (G,  r,£)  with  linear  latency  functions,  then  the  scaled-down  flow  // 2  is 
optimal  for  the  instance  (G,  r/2,£).  Second,  we  lower  bounded  the  cost  of  // 2  in 
terms  of  the  cost  of  /;  this  was  not  difficult  since  the  scaled-down  flow  // 2  was  a 
“significant  fraction”  of  /.  Finally,  we  lower  bounded  the  cost  of  augmenting  the 
flow  // 2  to  a  flow  optimal  for  (G,  r,  £)  in  terms  of  the  cost  of  /.  This  was  the  most 
difficult  part  of  the  proof;  roughly,  we  leveraged  the  connection  between  Nash  and 
optimal  flows  given  in  Corollary  2.3.2  to  show  that  the  marginal  cost  of  routing  new 
flow  with  respect  to  // 2  is  high,  and  thus  augmenting  the  flow  // 2  to  a  flow  feasible 
for  the  full  set  of  traffic  rates  r  is  costly. 
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A  direct  attempt  at  adapting  the  three-step  approach  of  the  previous  section 
to  more  general  latency  functions  fails  immediately.  In  particular,  for  nonlinear 
latency  functions  (even  for  quadratic  latency  functions),  there  is  no  constant  c  for 
which  a  scaled-down  version  f  jc  of  a  Nash  flow  /  is  optimal  for  the  reduced  traffic 
rates  r/c.  Thus,  it  is  not  at  all  clear  how  to  exploit  our  characterizations  of  Nash 
and  optimal  flows  to  relate  their  respective  costs.  To  circumvent  this  problem,  we 
view  the  proof  approach  of  the  previous  section  in  the  following  more  general  way: 
chop  up  an  optimal  flow  into  two  “pieces”  (in  the  linear  latency  case,  // 2  and  an 
augmentation  from  // 2  to  a  flow  feasible  for  rates  r )  such  that  each  piece  can  be 
lower-bounded  in  terms  of  the  cost  of  a  Nash  flow.  Guided  by  a  desire  to  define  the 
second  piece  of  the  optimal  flow  as  an  augmentation  of  the  first  and  to  lower  bound 
its  cost  by  means  of  marginal  cost  functions  (as  in  the  linear  latency  case),  we  will 
define  the  first  piece  in  a  way  that  ensures  that  any  augmentation  with  respect  to 
it  has  large  marginal  cost.  Unfortunately,  this  requires  scaling  down  a  Nash  flow  / 
by  different  factors  on  different  edges,  thereby  producing  an  object  which  is  not  a 
flow  (it  is  a  more  general  object  that  need  not  obey  conservation  constraints,  which 
we  call  a  pseudoflow).  While  this  does  not  significantly  complicate  the  lower  bound 
for  the  cost  of  the  scaled-down  pseudoflow  (it  is  a  “significant  fraction”  of  the  Nash 
flow,  as  before),  a  more  careful  analysis  is  now  required  to  lower  bound  the  cost  of 
an  augmentation  from  the  scaled-down  pseudoflow  to  a  flow  feasible  for  the  original 
instance  (as  we  are  augmenting  with  respect  to  an  object  more  complicated  than 
simply  a  flow  at  reduced  traffic  rates). 

3.3.3  Proof  of  Upper  Bound 

We  now  turn  toward  making  these  ideas  precise.  We  first  define  what  we  mean  by  a 
“scaled-down  pseudoflow”.  The  idea  is  to  scale  down  the  amount  of  Nash  flow  on  a 
single  edge  until  the  value  of  the  marginal  cost  function  equals  the  original  latency 
incurred  by  the  Nash  flow  on  that  edge  (this  original  latency  is  then  our  definition 
of  “large  marginal  cost”).  Formally,  if  /  is  a  flow  at  Nash  equilibrium,  our  scaled- 
down  pseudoflow  will  be  defined  by  {A efe}e£E  where  Ae  satisfies  £*e(Xefe )  =  £e(fe) 
(as  in  Definition  3.3.2).  As  discussed  following  Definition  3.3.2,  these  scaling  factors 
always  exist  but  need  not  be  unique;  our  analysis  must  work  with  an  arbitrary  choice 
of  scaling  factors. 

The  next  lemma  formalizes  the  notion  of  “breaking  up  the  optimal  flow  into  two 
pieces” .  Again,  the  idea  is  to  express  the  cost  of  the  optimal  flow  as  one  term  that 
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is  a  scaled-down  version  of  a  Nash  flow,  and  a  second  term  that  corresponds  to  an 
augmentation  with  respect  to  large  marginal  costs. 

Lemma  3.3.5  Let  f*  and  f  be  optimal  and  Nash  flows,  respectively,  for  instance 
(G,r,£)  with  standard  latency  functions.  For  an  edge  e,  let  Ae  e  (0, 1)  be  a  solution 
to  l’e(Kfe)  =  4(/e)'  Then, 

C(f)  >  Y.  [4(Ae/,)A ./.  +  (/:  -  A  ./,)£,(/.)]  ■ 

e 

Proof.  Since  each  edge  latency  function  £e  is  standard,  each  marginal  cost  function 
£*  is  nondecreasing.  For  an  edge  e,  we  may  thus  write 

4 (/*)/*  =  te(Kfe)Kfe+  ^  KWdx 

*^Ae/e 

>  L(\efe)\efe+(f:-\efeK(\efe) 

=  4(Ae/e)Ae/e  +  (/e*  ~  Ae/e)4(/e) 

with  the  hnal  equality  holding  by  the  definition  of  Ae.  Summing  over  all  edges  proves 
the  lemma.  ■ 

Note  that  neither  the  statement  nor  the  proof  of  Lemma  3.3.5  assumes  that  the 
expression  /*  —  A efe  is  nonnegative  for  all  edges  e;  as  in  the  previous  section,  the 
augmentation  from  the  pseudoflow  defined  by  {A efe}e£E  to  a  flow  f*  optimal  for  the 
original  instance  may  increase  or  decrease  the  amount  of  flow  on  an  edge. 

To  lower  bound  the  right-hand  side  of  Lemma  3.3.5,  we  require  two  more  easy 
lemmas.  The  next  lemma  simply  rephrases  Definitions  3.3.2  and  3.3.3. 

Lemma  3.3.6  Let  C  be  a  standard  class  of  latency  functions  with  anarchy  value 
a(C).  For  £  G  C  and  f  >  0,  let  A  G  (0,1)  satisfy  £*(\f)  =  £(f)  and  define  p  by 
p  =  £(\f)/£(f)  (if  £(f)  =  0,  put  p  =  1).  Then  Xp  +  (1  -  A)  >  ^y. 

Our  final  lemma  states  that  if  /  is  a  Nash  flow  for  (G,  r,  £),  then  /  is  a  min-cost 
flow  (in  the  classical  sense  of  network  flow  theory  [179])  with  respect  to  the  cost 
vector  4(/e)- 

Lemma  3.3.7  Let  f  be  at  Nash  equilibrium  and  f*  feasible  for  instance  ( G,r,£ ). 
Then, 
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Proof.  Let  Lj(/)  denote  the  common  latency  of  every  Si-ti  flow  path  of  /,  so  that 

EW«)/«  =  c(/)  =  En(/K 

e  i= 1 

by  Proposition  2.2.2.  Since  /  is  at  Nash  equilibrium,  £p(f)  >  L^f)  for  every 
path  P ;  we  may  thus  write 

£«/«)/;  =  E  £  tP(f)rP  >  y 

e  i  I  PePi  i  1 

proving  the  lemma.  ■ 

With  all  of  the  preliminaries  now  in  place,  we  state  and  prove  the  main  result 
of  this  section:  the  anarchy  value  of  a  standard  class  £  of  latency  functions  upper 
bounds  the  ratio  p  for  any  instance  with  latency  functions  in  £.2 


Theorem  3.3.8  Let  £  be  a  standard  class  of  latency  functions  with  anarchy  value 
a(£).  Let  (G,r,£)  denote  an  instance  with  latency  functions  drawn  from  £.  Then 
p(G,  r,  £)  <  a(£). 


Proof.  By  Lemma  3.3.5  we  have 

c(n  >  eww.w  +  (/«'  -  v/«)«/.)] 

e 

=  EM«/«  +  (i  -  a.)/.  +  (/;  -  /,)]£,(/,) 

e 

=  EM Je  +  (1  -  K)fe]«Me)  +  EK  "  /J We) 
e  e 

where  Ae  G  (0,1)  is  chosen  (arbitrarily)  to  satisfy  £*e(\ efe)  =  £e(fe)  and  where 
pe  G  [0, 1]  is  defined  by  pe  =  4(Ae/e)/4(/e)  (if  4(4)  =  0,  put  pe  =  1).  The  second 
sum  is  nonnegative  by  Lemma  3.3.7,  so  we  may  derive 

C(D  >  EkAe/s  +  (1  -  \e)fe]L(fe); 

e 

applying  Lemma  3.3.6  we  obtain 


C(f*) 


> 

> 


Ae  +  (1  —  Ae)]4(/e)/e 

e 

T.We  )fe 

a{£)  V 

C{f) 

a{£) 


and  the  theorem  is  proved.  ■ 


2We  are  indebted  to  Amir  Ronen  for  substantially  simplifying  our  original  proof  of  this  theorem. 
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3.4  The  Price  of  Anarchy  is  Independent  of  the 
Network  Topology 

With  Theorem  3.3.8  in  hand,  it  is  now  a  relatively  easy  matter  to  prove  that  the  price 
of  anarchy  is  independent  of  the  network  topology.  In  Subsection  3.4.1  we  prove  that, 
with  respect  to  a  standard  class  of  allowable  edge  latency  functions  that  contains 
the  constant  functions,  the  worst  possible  value  of  p{G,  r,  £)  for  a  multicommodity 
instance  ( G ,  r,  £)  is  realized  (up  to  an  arbitrarily  small  additive  factor)  by  a  single- 
commodity  instance  on  a  two-node,  two-link  network.  In  Subsection  3.4.2,  we  prove 
that  under  significantly  weaker  conditions  on  the  class  of  allowable  latency  functions, 
the  worst-case  value  of  p(G,  r,  £)  is  achieved  (again,  up  to  an  arbitrarily  small  factor) 
by  a  single-commodity  instance  on  a  network  of  parallel  links. 

3.4.1  Lower  Bounds  in  Two-Link  Networks 

We  begin  by  formalizing  an  argument  of  the  previous  section;  the  following  lemma 
is  essentially  a  restatement  of  Definitions  3.3.2  and  3.3.3. 

Lemma  3.4.1  Let  G 2  denote  the  graph  with  one  source  vertex,  one  destination  ver¬ 
tex,  and  two  edges  directed  from  source  to  destination.  Let  C  denote  a  standard  class 
of  latency  functions  containing  the  constant  functions,  with  anarchy  value  a(C).  If 
Z2  denotes  the  set  of  all  instances  with  underlying  network  G2  and  latency  functions 
in  C,  then 

sup  p(G2,r,£)  >  a(£). 

{G2,r,e)ei2 

Proof.  We  will  assume  that  a(C)  is  finite,  and  will  leave  the  straightforward 
modifications  necessary  for  the  a(C)  =  +00  case  to  the  interested  reader. 

For  any  e  >  0,  choose  a  nonzero  latency  function  £\  e  C,  a  positive  number 
r  >  0  with  £1  (r)  >  0,  and  a  scalar  A  e  (0,1)  satisfying  £\{\ r)  =  £\ (r)  so  that 
[A/i+ (1  —  A)]-1  >  a(C)  —  e,  where  p  =  £\{\r) / £\{r) .  Let  £2  G  C  denote  the  constant 
function  that  is  everywhere  equal  to  £\{r).  Define  an  instance  on  G2  with  latency 
functions  £\  and  t2  and  traffic  rate  r.  The  total  latency  incurred  by  the  Nash  flow 
is  £\ (r)r,  while  that  of  the  optimal  flow  is  £i(r)r[\p  +  (1  —  A)];  hence  the  p- value  of 
this  instance  is  at  least  a(£)  —  e.  Since  e  >  0  was  arbitrary,  the  lemma  follows.  ■ 

Combining  Theorem  3.3.8  and  Lemma  3.4.1,  we  find  that  the  price  of  anarchy 
with  respect  to  a  standard  class  of  latency  functions  containing  the  constant  func- 
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tions  is  independent  of  the  class  of  allowable  network  topologies  (thereby  generalizing 
Theorem  3.2.6  and  the  matching  lower  bound  of  Pigon’s  example). 

Theorem  3.4.2  Let  G 2  denote  the  graph  with  one  source  vertex,  one  destination 
vertex,  and  two  edges  directed  from  source  to  destination.  Let  C  be  a  standard  class 
of  latency  functions  containing  the  constant  functions.  If  1  denotes  the  set  of  all 
instances  with  latency  functions  in  £  and  Z2  C  X  the  instances  with  underlying 
network  G2 .  then 


sup  p(G2,  r,  I)  —  ot(C)  —  sup  p(G,r,£). 

{G2,r/)el2  (G,r,£)eX 

3.4.2  Lower  Bounds  in  Networks  of  Parallel  Links 

We  now  relax  the  assumption  that  the  class  of  allowable  latency  functions  con¬ 
tains  all  of  the  constant  functions,  and  assume  instead  the  following  much  weaker 
condition:  for  any  positive  real  number  a,  there  is  a  latency  function  £  satisfying 
£(0)  =  a.  We  call  such  a  class  of  latency  functions  diverse.  For  any  class  of  latency 
functions  that  is  closed  under  multiplication  by  positive  scalars3,  diversity  merely 
asserts  that  some  latency  function  is  positive  when  evaluated  at  0.  Under  these 
weaker  hypotheses,  we  have  the  following. 

Lemma  3.4.3  Let  Gm  denote  the  graph  with  one  source  vertex,  one  destination 
vertex,  and  m  edges  directed  from  source  to  destination.  Let  £  be  a  standard  and 
diverse  class  of  latency  functions  with  anarchy  value  a(£).  IfTm  denotes  the  set  of 
all  instances  with  underlying  network  Grn  and  latency  functions  in  £,  then 

sup  p(G,r,£)  >  a(£). 

(G,r,e)eumim 

Proof.  We  again  assume  for  simplicity  that  01(C)  is  finite.  For  any  e  >  0,  choose 
a  nonzero  latency  function  4  6  £,  a  positive  number  r  >  0  with  £\(r)  >  0,  and  a 
scalar  A  G  (0, 1)  satisfying  £\(X r)  =  £\ (r)  so  that  [Xp  +  (1  —  A)]”1  >  01(C)  —  e/2, 
where  p  =  £\(Xr)  /  £\(r) .  Since  C  is  diverse,  there  is  a  function  £2  e  C  satisfying 
£2(0)  =  i\ (r).  The  main  idea  of  the  proof  is  to  use  many  links,  all  with  latency 
function  £2,  to  approximately  “simulate”  a  single  link  with  the  constant  latency 
function  everywhere  equal  to  £\  (r). 

3Since  a  scalar  multiplication  of  the  latency  functions  can  be  effected  simply  by  changing  the 
units  in  which  we  measure  latency,  we  expect  most  classes  of  interest  to  satisfy  this  property. 
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Let  m  be  so  large  that  —  ^i(r)  +  where  <5  is  a  sufficiently  small 

positive  number  (depending  on  e)  to  be  chosen  later;  existence  of  the  integer  m 
follows  from  continuity  of  1 2  at  0.  Define  an  instance  on  Gm  with  traffic  rate  r, 
latency  function  l\  on  one  link,  and  latency  function  l2  on  the  other  m  —  1  links. 
The  total  latency  incurred  by  the  Nash  flow  is  £\  (r)r,  as  all  flow  is  routed  on  the 
link  with  latency  function  i\.  The  flow  that  routes  A r  units  of  flow  on  the  link  with 
latency  function  £\  at  a  cost  of  Xrp,£i(r)  and  (1  —  A )r/ (m  —  1)  units  of  flow  on  each 
of  the  other  m  —  1  links  has  total  latency  at  most  £i(r)r[Xp  +  (1  —  A)  +  by 

choice  of  m.  Choosing  5  sufficiently  small,  we  obtain  an  instance  with  p- value  at 
least  ch(jC)  —  e.  Since  e  >  0  was  arbitrary,  the  lemma  follows.  ■ 

Theorem  3.3.8  and  Lemma  3.4.3  together  imply  the  main  result  of  this  section. 

Theorem  3.4.4  Let  Grn  denote  the  graph  with  one  source  vertex,  one  destination 
vertex,  and  m  edges  directed  from  source  to  destination.  Let  C  he  a  standard  and 
diverse  class  of  latency  functions.  If  1  denotes  the  set  of  all  instances  with  latency 
functions  in  C,  and  Xm  C  I  the  instances  with  underlying  network  Grn ,  then 

sup  p(G,r,£)  —  a(C)  —  sup  p(G,r,£). 

{G,r,t)&JmZm  {G,r,t)el 

Remark  3.4.5  The  conclusion  of  the  theorem  is  false  with  U mIm  replaced  by  Z2 
(for  a  counterexample,  take  C  =  {£(x)  =  x}  U  {£(x)  =  o(  1  +  x)  :  a  >  0}).  The 
conclusion  of  the  theorem  is  also  false  when  the  class  of  allowable  latency  functions 
need  not  be  diverse  (for  a  counterexample,  let  C  =  {£(x)  =  1  +  x}). 

Remark  3.4.6  That  the  price  of  anarchy  is  independent  of  the  network  topology 
is  a  remarkable  fact;  to  better  appreciate  this,  we  will  foreshadow  some  forthcoming 
results.  In  Chapter  4  we  will  generalize  the  traffic  model  studied  in  this  chapter  in 
several  different  directions,  and  we  will  often  see  that  general  network  topologies  are 
more  poorly-behaved  than  networks  of  parallel  links.  In  Chapter  5  we  will  study 
Braess’s  Paradox  and  generalizations  of  it,  and  will  discover  that  the  severity  of 
the  paradox  grows  with  the  network  size  and  cannot  occur  in  networks  of  parallel 
links.  Finally,  in  Chapter  6  we  will  generalize  the  traffic  model  studied  here  to 
accommodate  a  different  equilibrium  concept,  Stackelberg  equilibria,  and  will  prove 
that  the  inefficiency  of  such  equilibria  can  be  strictly  larger  in  general  network 
topologies  than  in  networks  of  parallel  links. 
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3.5  Computing  the  Price  of  Anarchy 

In  this  section,  we  leverage  the  results  of  Sections  3.3  and  3.4  to  show  that  com¬ 
puting  the  price  of  anarchy  with  respect  to  an  (almost)  arbitrary  standard  class  of 
latency  functions  reduces  to  computing  the  anarchy  value  of  the  class,  even  when  the 
diversity  condition  of  Theorem  3.4.4  fails  (as  in  the  important  case  of  M/M/1  delay 
functions  with  some  minimum  allowable  queue  service  rate).  This  provides  a  gen¬ 
eral  reduction  from  a  combinatorial  problem  (finding  a  worst-case  instance  among 
all  possible  multicommodity  flow  instances)  to  a  simpler  analytical  one  (finding  the 
“steepest”  latency  function  in  a  given  class).  Subsection  3.5.1  describes  this  method, 
and  Subsection  3.5.2  computes  the  price  of  anarchy  for  several  important  function 
classes. 

3.5.1  More  Techniques  for  Computing  the  Price  of  Anarchy 

In  the  previous  section,  we  saw  that  the  price  of  anarchy  with  respect  to  a  standard 
and  diverse  class  of  latency  functions  is  precisely  the  anarchy  value  of  the  class 
(Theorem  3.4.4).  In  this  subsection  we  will  show  that  this  fact  remains  true  under 
even  weaker  hypotheses.  From  a  computational  perspective,  this  result  has  the 
following  interpretation:  to  compute  the  price  of  anarchy  with  respect  to  an  (almost) 
arbitrary  standard  class  of  latency  functions  £,  it  suffices  to  compute  the  worst- 
possible  ratio  between  the  cost  of  a  Nash  and  of  an  optimal  flow  in  a  two-node, 
two-link  network  where  one  link  possesses  a  constant  latency  function  and  the  other 
link  possesses  a  latency  function  of  the  form  vt  for  l  e  £  and  a  positive  scalar  v  >  0 
(even  though  £  need  not  contain  vi  or  any  constant  functions). 

The  first  step  of  this  reduction  is  the  following  lemma,  which  implies  that  we 
can  always  assume  that  our  class  of  latency  functions  is  closed  under  multiplication 
by  positive  scalars. 

Lemma  3.5.1  Let  £  be  a  standard  class  of  latency  functions,  and  define  £  as  the 
closure  of  £  under  multiplication  by  positive  scalars  (so  £  =  {vi  :  £  e  £,  v  >0}| 
Let  Z  denote  the  set  of  instances  with  latency  functions  in  £,  and  Z  the  set  of 
instances  with  latency  functions  in  £.  Then, 

sup  p(G,r,£)  —  sup  p(G,r,£). 

(■ G,r,e)eJ  (G,r/)el 


57 


Proof.  The  left-hand  side  trivially  lower  bounds  the  right-hand  side  since  C  D  C 
and  hence  ID1  For  the  reverse  inequality,  we  will  show  that  for  any  instance 
(G,  r,£)  el  and  any  e  >  0,  there  is  an  instance  (G,  r,  £)  G  X  satisfying  p(G,  r,  £)  > 
p(G,r,£)  —  e.  Fix  an  instance  (G,r,£)  G  X  and  e  >  0,  and  for  an  edge  e  of  G  write 
~£e  =  ue£e  for  ve  >  0  and  £e  G  C.  The  ratio  p  is  a  continuous  function  of  each  scalar  ve 
(holding  the  network  G  and  the  traffic  rate  vector  r  fixed),  and  we  may  thus  replace 
each  ue  by  a  sufficiently  close  positive  rational  number  r]e  to  obtain  a  new  instance 
with  p- value  at  least  p(G,r,£)  —  e.  Clearing  denominators,  we  may  assume  that  each 
scalar  r/e  is  a  positive  integer  (multiplying  all  latency  functions  of  an  instance  by  a 
common  positive  number  does  not  affect  its  p- value).  Finally,  replacing  each  edge  e 
with  a  directed  path  of  r\e  edges,  each  endowed  with  latency  function  £e,  we  obtain 
a  network  with  latency  functions  in  C  and  with  p- value  at  least  p(G,  r,  £)  —  e.  ■ 

The  following  observation  will  also  be  useful. 

Lemma  3.5.2  Let  C  be  a  standard  class  of  latency  functions,  and  define  C  as  the 
closure  of  C  under  multiplication  by  positive  scalars.  Then  C  and  C  have  equal 
anarchy  value. 

Proof.  Simply  note  that  the  functions  £  and  u£  (for  0  £  G  C  and  v  >  0)  have 

equal  anarchy  value.  ■ 

Lemmas  3.5.1  and  3.5.2  yield  the  following  theorem,  which  reduces  computing 
the  price  of  anarchy  (the  combinatorial  problem  of  finding  a  worst-possible  multi- 
commodity  flow  instance)  to  computing  the  anarchy  value  (the  simpler  analytical 
problem  of  determining  the  worst  behavior  exhibited  by  any  function  in  a  given 
class) . 

Theorem  3.5.3  Let  C  be  a  standard  class  of  latency  functions  containing  a  function 
£  satisfying  £{ff)  >  0,  with  anarchy  value  «(£).  Ifl  denotes  the  set  of  instances  with 
latency  functions  in  C,  then 


sup  p(G,r,£)  =  a(C). 

(G,r/)el 

Proof.  Since  C  is  standard  and  contains  a  function  that  is  positive  at  0,  the  class 
C  =  {v£  :  £  G  £,  v  >  0}  is  standard  and  diverse.  By  Theorem  3.4.4,  we  have 

sup  p(G,  r,  £)  =  a(C) 

{G,r,i)Lj 
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where  Z  denotes  the  set  of  instances  with  latency  functions  in  C.  Applying  Lem¬ 
mas  3.5.1  and  3.5.2,  we  obtain  the  desired  equality.  ■ 

Remark  3.5.4  The  conclusion  of  Theorem  3.5.3  fails  if  the  hypothesis  that  some 
function  is  positive  with  zero  congestion  is  omitted  (recall  Corollary  3.2.2  and  con¬ 
sider  the  counterexample  class  C  =  {ax  :  a  >  0}).  We  do  not  know  if  the  as¬ 
sumption  that  the  function  class  is  standard  can  be  omitted.  We  leave  open  the 
problem  of  computing  the  price  of  anarchy  for  classes  of  latency  functions  that  fail 
to  satisfy  these  two  hypotheses,  though  it  is  not  clear  if  such  function  classes  have 
any  practical  import. 

3.5.2  Applications 

We  are  finally  prepared  to  put  our  techniques  to  use  in  computing  the  price  of 
anarchy  for  some  concrete  function  classes.  We  give  only  three  illustrative  examples; 
it  will  be  obvious  that  many  other  function  classes  can  be  treated  in  a  similar  way. 

Polynomial  Latency  Functions 

For  a  positive  integer  p,  let  Cp  denote  the  set  of  latency  functions  that  are  polyno¬ 
mials  with  nonnegative  coefficients  and  degree  at  most  p.  As  a  first  showcase  for  our 
machinery,  we  next  compute  the  price  of  anarchy  with  respect  to  latency  functions 

r 

Proposition  3.5.5  If  Xp  is  the  set  of  instances  with  latency  functions  in  Cp,  then 

sup  p(G,  r,  £)  —  [1  —  p  ■  (p  +  l )-(p+1)/p]-1  —  0  f  . 

{G,r,e)exP  Vln  pj 

Proof.  Since  Cp  is  standard  and  contains  the  constant  functions,  Theorem  3.4.2 
implies  that  the  price  of  anarchy  is  simply  the  anarchy  value  of  Cp.  We  claim 
that  it  suffices  to  compute  the  anarchy  value  of  the  smaller  function  class  consist¬ 
ing  of  functions  of  Cp  comprising  only  one  term,  namely  Cp  =  {ax1  :  a  >  0,i  € 
{0, 1,  2, . . .  ,p}}.  This  claim  is  valid  because  an  instance  (G,  r,  £)  with  latency  func¬ 
tions  in  Cp  can  be  transformed  into  an  equivalent  instance  with  latency  functions 
in  Cp  by  replacing  an  edge  e  of  G  with  latency  function  £e(x)  =  Jfi=oaixl  by  a  di¬ 
rected  path  of  p  T  1  edges,  with  the  ith  edge  of  the  path  possessing  latency  function 
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4  ,i(X)  =  aixl -4 

We  next  compute  the  anarchy  value  a(£)  of  an  arbitrary  nonzero  function  £{x)  = 
ax 1  of  jCp  (recall  Definition  3.3.2).  If  i  —  0  then  a(£)  =  1;  otherwise,  t *  is  strictly 
increasing  and  the  scalar  A  is  uniquely  determined  by  the  choice  of  r.  In  this  case, 
for  r  >  0  we  have  A  =  (i  +  l)-1'*,  hence  /a  =  A*  =  (i  +  l)-1,  hence  [A/i  +  (1  —  A)]-1  = 
[{i  +  1)_^+1)/*  +  (1  —  (i  +  l)-1/*)]-1  =  [1  —  %  ■  (i  +  1)_(*+1)A]_1.  Since  this  expression 
is  independent  of  r  >  0,  we  obtain  a(£)  —  [1  —  i  ■  (i  +  l)~h+i)A]-i.  This  expression 
is  independent  of  a  and  is  increasing  in  i  on  [0,p]  (as  shown  by  a  simple  derivative 
test),  so  the  functions  of  Cp  with  largest  anarchy  value  are  those  of  the  form  axp  for 
a  >  0;  hence,  a(Cp)  =  a(Cp )  =  [1  —  p  ■  (p  +  1)_(p+1)/p]_1.  ■ 

Remark  3.5.6  A  sharp  lower  bound  on  the  left-hand  side  of  Proposition  3.5.5 
is  provided  by  the  nonlinear  variant  of  Pigou’s  example  (Subsection  2.4.4);  the 
content  of  the  proposition  is  that  no  worse  example  is  possible,  even  in  arbitrary 
multicommodity  flow  networks. 

Delay  Functions  of  M/M/1  Queues 

Latency  functions  of  the  form  £{x)  —  {u  —  x)_1  arise  as  the  (expected)  delay  func¬ 
tion  of  an  M/M/1  queue5  with  service  rate  (or  capacity)  u  [80],  and  for  this  reason 
have  been  extensively  studied  in  the  networking  literature  [20,  105,  106,  113,  138]. 
These  latency  functions  do  not  directly  fit  into  our  framework,  since  they  are  de¬ 
fined  only  on  the  set  [0,w),  rather  than  on  all  of  [0,  oo).  Nevertheless,  only  minor 
generalizations  of  our  results  are  needed  to  compute  the  price  of  anarchy  in  this 
setting. 

We  will  fix  two  parameters,  the  largest  allowable  sum  of  all  traffic  rates  Rmax 
and  the  smallest  allowable  edge  capacity  umin.  We  will  assume  that  Rmax  <  Umin ; 
while  it  may  seem  unreasonable  to  assume  that  any  edge  of  the  network  has  the 
capacity  to  carry  all  of  the  demand,  our  computations  below  will  show  that,  in  the 
absence  of  further  assumptions,  the  price  of  anarchy  is  +oo  if  the  sum  of  traffic 
rates  can  be  arbitrarily  close  to  (or  greater  than)  the  smallest  edge  capacity.  Under 
this  assumption,  the  restricted  domains  of  the  latency  functions  pose  no  difficulty; 

4This  maneuver  illustrates  a  general  principle:  if  C  is  the  cone  generated  by  a  (possibly  infinite) 
standard  class  of  latency  functions  S  (i.e.,  C  is  all  finite  affine  combinations  of  functions  in  S), 
then  a(C)  =  a(S). 

5By  M/M/1,  we  mean  a  single  queue  with  Poisson  arrivals  and  exponentially  distributed  service 
times  [80]. 
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every  feasible  flow  routes  at  most  Rmax  units  of  flow  on  every  edge  and  hence  has  a 
well-defined  cost. 

Let  C  denote  the  set  of  latency  functions  {£(x)  —  (u  —  x)~l  :  u  >  umin} 
and,  for  the  purposes  of  this  example  only,  redefine  the  anarchy  value  a(£)  of  a 
latency  function  £  to  be  a(£)  =  supr . 0 <r<Rmax  [A/i  +  (1  —  A)]-1,  where  A  is  the  unique 
scalar  satisfying  £*(Xr)  =  £(r)  and  /i  =  £{X r)/£(r).  The  key  difference  between  this 
definition  and  the  original  dehnition  of  anarchy  value  (Definition  3.3.2)  is  that  the 
range  of  traffic  rates  we  consider  is  restricted  to  lie  in  (0,  Rmax]  rather  than  (0,  oo); 
this  ensures  that  the  equations  defining  A  and  //  make  sense. 

Next,  it  is  straightforward  to  check  that  Theorem  3.3.8  and  hence  Theorem  3.5.3 
remain  valid  with  onr  new  definition  of  anarchy  value,  provided  we  only  care  about 
the  worst-possible  value  of  p  achieved  by  instances  whose  sum  of  all  traffic  rates  is  at 
most  Rmax ■  Since  the  class  C  satisfies  both  hypotheses  of  Theorem  3.5.3,  computing 
the  price  of  anarchy  with  respect  to  C  for  instances  with  sum  of  all  traffic  rates  at 
most  Rmax  reduces  to  computing  the  anarchy  value  of  C. 


Proposition  3.5.7  Ifl  is  the  set  of  instances  with  latency  functions  in  C  and  sum 
of  all  traffic  rates  at  most  Rmax ,  then 


sup  p(G,  r,  £) 
0 G,r,£)el 


Proof.  The  previous  discussion  implies  that  we  need  only  check  that  a(£)  = 
[1  +  \J umin/ (umin  —  Rmax)]/ 2.  We  begin  by  computing  the  anarchy  value  of  an 
arbitrary  function  in  £,  say  £{x)  —  (u  —  x)~l  for  u  >  umin.  The  marginal  cost 
function  £*  is  given  by  fjf(x(u  —  a;)-1)  and  hence 


Now  fix  r  G  (0,  Rmax\'i  A  is  defined  to  solve  the  equation  £*(Xr)  =  £(r)  and  hence 
satisfies 

u  1 

(u  —  X  r)2  u  —  r 

Solving,  we  obtain  A  —  (u  —  \Ju{u  —  r))/r.  Next, 


_  £(Xr)  _ 


\fu  —  r 


\/u  -  r) 


u 


u  —  r 


u  —  r 
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It  remains  to  compute  [A ji  +  (1  —  A)]  1 


[A/i  +  (1  —  A)]  — 


u  —  \fu\/u  —  r  \Ju  —  r 


+  1 


u  —  \fu\/u  —  r 


i  -i 


2  u\Ju  —  r  —  2  Uy/u  +  2  ry/u 


n  -1 


T\/U 


1  r  a/ua/m  —  r  +  (u  —  r) 

2  yju^/u  —  r  —  (u  —  r)  y/u\/u  —  r  +  (u  —  r) 

1  r[y/uy/u  —  r  +  {u  —  r)] 

2  u[u  —  r)  —  {u  —  r)2 
1  y/u^/u  —  r  +  {u  —  r) 


2  u  —  r 


Since  this  expression  is  increasing  in  r,  it  follows  that 

a(£)  =  Ul  +  \l  _U  y 

2  \  y  ^  Rmax  J 

Since  the  anarchy  value  is  decreasing  in  the  edge  capacity,  we  have 

^min  Rmax 

as  claimed.  ■ 

Remark  3.5.8 

(1)  We  note  that  the  above  class  C  is  not  diverse  (since  £(0)  <  —  for  all  £  e  £); 

indeed,  the  worst-possible  value  of  the  ratio  p  need  not  be  achieved  on  a  set 
of  parallel  links  for  this  class  of  latency  functions  (cf.  Theorem  3. 4. 4). 6  Thus, 
the  extensions  provided  in  Subsection  3.5.1  are  crucial  in  this  application. 

(2)  The  anarchy  value  of  £  and  hence  the  worst  possible  value  of  p  go  to  +oo 
as  Rmax  —■ y  umin]  fulfilling  a  previous  promise,  this  shows  that  the  hypothesis 
that  Rmax  is  bounded  away  from  umin  is  necessary  for  the  price  of  anarchy  for 
networks  with  M/M/1  delay  functions  to  be  finite. 

6However,  the  proof  of  Lemma  3.5.1  shows  that  subdivisions  of  parallel  networks  suffice  to 

achieve  the  worst-case  bound. 
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Delay  Functions  of  M/G/l  Queues 


As  a  final  example,  we  extend  the  preceding  analysis  to  queues  that  need  not  have 
exponentially  distributed  service  times — that  is,  to  M/G/l  delay  functions  (we  re¬ 
tain  our  assumptions  of  a  single  queue  and  Poisson  arrivals).  Our  solution  will  not 
be  as  clean  as  in  the  M/M/1  case,  but  will  demonstrate  that  our  techniques  for 
computing  the  price  of  anarchy  remain  useful  even  for  relatively  complex  classes  of 
allowable  latency  functions. 

Recall  that  if  a  queue  service  distribution  (specifying  the  number  of  customers 
served  in  a  time  step)  has  expectation  p  and  standard  deviation  o  (both  of  which 
we  assume  to  be  finite),  then  the  expected  waiting  time  under  Poisson  arrivals  with 
rate  A  is 


1  A(1  +  a2fi2) 

yU  2  yLt(yU  —  A)  ’ 

see  [80]  or  [94]  for  a  derivation.  To  rephrase  this  formula  in  our  usual  notation,  we 
view  the  parameter  p,  as  the  edge  capacity  u  and  the  Poisson  rate  A  as  the  amount 
of  traffic  assigned  to  an  edge;  we  are  then  interested  in  latency  functions  t  of  the 
following  form: 


1  X(1  +  <J  2U2) 

£(x)  =  -  +  - -A 

u  2  u(u  —  x ) 

As  in  the  M/M/1  case,  to  achieve  an  interesting  result  we  will  need  to  assume  a 
minimum  allowable  capacity  umm  and  a  maximum  allowable  sum  of  all  traffic  rates 
Rmax  Urnin  • 

The  anarchy  value  of  such  a  function  can  be  computed  by  the  same  method  as 
for  M/M/1  case  (though  the  calculations  are  more  tedious).  It  turns  out  that  the 
anarchy  value  of  a  latency  function  l  with  the  above  formula  is 

A  2u  +  Rmax(cr2u2  —  1) 


a(£)  =  1  + 


/ _ u_ 

u  —  R 


max  J  4 u  +  (u  +  Rmax  -  Ju(u-  Rmax))(cr2u 2  -  1) 


Applying  Theorem  3.5.3,  we  obtain  the  following  proposition. 


Proposition  3.5.9  Let  C  be  a  non-empty  collection  of  M/G/l  delay  functions  with 
expected  service  rate  at  least  umin.  Then,  the  price  of  anarchy  for  instances  with 
latency  functions  in  C  and  sum  of  all  traffic  rates  at  most  Rmax  <  umm  is  precisely 

ue  \  _ 2 ue  +  Rmax(crju2e  -  1) _ 

Ue  ~  Rrnax)  4 UC  +  (ue  +  Rmax  ~  ^ Ue(u£  ~  Rm.ax))  (ofttf  -  1) 

where  tq  and  an  denote  the  expectation  and  standard  deviation  of  the  service  rate 
distribution  associated  with  i. 


sup  1  + 
\ 
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Without  more  assumptions  on  the  class  £,  we  cannot  simplify  the  expression 
of  Proposition  3.5.9  further;  this  reflects  the  relative  complexity  of  M/G/l  delay 
functions  (which  are  specified  by  two  independent  parameters  u £  and  oy,  unlike  the 
simpler  M/M/1  case).  On  the  other  hand,  reducing  the  computation  of  the  price 
of  anarchy  to  computing  the  expression  of  Proposition  3.5.9  is  both  nontrivial  and 
useful.  When  the  class  £  possesses  structure  beyond  merely  being  some  collection 
of  M/G/l  delay  functions,  the  expression  of  Proposition  3.5.9  may  become  simple 
and  transparent  (as  in  the  special  case  of  M/M/1  delay  functions,  where  oy«£  =  1 
for  all  £).  Even  for  classes  for  which  no  analytical  simplification  is  possible,  Propo¬ 
sition  3.5.9  should  permit  the  approximate  (if  not  exact)  computation  of  the  price 
of  anarchy  with  respect  to  £  by  straightforward  numerical  methods;  in  the  sim¬ 
plest  case  where  £  is  finite  and  not  astronomically  large  (and  we  suspect  almost  all 
classes  of  M/G/l  delay  functions  can  be  closely  approximated  by  such  an  £),  the 
price  of  anarchy  can  be  computed  simply  by  enumeration.  We  note  that  without 
the  assurance  that  simple  network  topologies  always  provide  worst-case  examples, 
an  enumerative  approach  to  computing  the  price  of  anarchy  would  be  unthinkable. 

3.6  A  Bicriteria  Bound  for  Networks  with  Arbi¬ 
trary  Latency  Functions 

The  nonlinear  variant  of  Pigou’s  example  (Subsection  2.4.4)  shows  that,  assuming 
only  continuity  and  monotonicity  of  the  edge  latency  functions,  the  price  of  anarchy 
cannot  be  bounded  above  (even  as  a  function  of  the  network  size).  On  the  other 
hand,  this  example  does  not  rule  out  interesting  bicriteria  results.  Toward  this  end, 
we  compare  the  cost  of  a  flow  at  Nash  equilibrium  to  that  of  an  optimal  flow  feasible 
for  increased  rates.1  In  the  example  of  Subsection  2.4.4,  an  optimal  flow  feasible  for 
rate  r  >  1  assigns  the  additional  flow  to  the  upper  link,  now  incurring  a  cost  that 
tends  to  r  —  1  as  p  — >  oo.  In  particular,  for  any  p  an  optimal  flow  feasible  for  twice 
the  rate  (r  =  2)  has  total  latency  at  least  that  of  the  flow  at  Nash  equilibrium 
(feasible  for  the  original  rates).  We  next  prove  that  this  statement  holds  in  any 
network  with  continuous,  nondecreasing  edge  latency  functions. 

7This  approach  is  in  the  spirit  of  the  analyses  of  online  scheduling  algorithms  via  resource 
augmentation  given  by  Kalyanasundaram  and  Pruhs  [91]  and  Phillips  et  al.  [147]. 
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flow - ► 


flow  — ► 


(a)  Graph  of  latency  function  ie  (b)  Graph  of  latency  function  Ie 

and  its  value  at  flow  value  fe 

Figure  3.1:  Construction  in  the  proof  of  Theorem  3.6.1  of  modified  latency  function 
£e  given  original  latency  function  £e  and  Nash  flow  value  fe.  Solid  lines  denote 
graphs  of  functions. 


Theorem  3.6.1  If  f  is  a  flow  at  Nash  equilibrium  for  (G,r,£)  and  f*  is  feasible 
for  (G,2r,£),thenC(f)<C(f*). 


Proof.  Suppose  /,  /*  satisfy  the  hypotheses  of  the  theorem.  For  i  =  1  ,...,&, 
let  Li(f)  be  the  latency  of  an  Si-ti  flow  path  of  /,  so  that  C(f)  =  'ffi.Lflffli 
(see  Proposition  2.2.4).  We  seek  a  set  of  latency  functions  l  that  on  one  hand 
approximates  the  original  ones  (in  the  sense  that  the  cost  of  a  flow  with  respect  to 
latency  functions  l  is  close  to  its  original  cost)  and,  on  the  other  hand,  allows  us  to 
easily  lower  bound  the  cost  (with  respect  to  £)  of  any  feasible  flow.  With  this  goal 
in  mind,  we  define  new  latency  functions  £  as  follows: 


We)  if  x  <  fe 
£e(x)  if  x  >  fe. 


Figure  3.1  illustrates  this  construction. 

First  we  compare  the  cost  of  the  flow  /*  under  the  new  latency  functions  l  to 
its  original  cost  C(f*).  For  any  edge  e,  £e(x)  —  £e(x)  is  zero  for  x  >  fe  and  bounded 
above  by  £e(fe)  for  x  <  fe,  so  x(£e(x)  —  £e(x))  <  £e(fe)fe  for  ail  a:  >  0.  Notice 
that  the  left-hand  side  (the  discrepancy  between  x£e(x)  and  x£e(x))  is  maximized 
when  x  is  slightly  smaller  than  fe  and  when  £e(x)  =  0;  in  this  case,  the  value  of 
the  left-hand  side  is  essentially  the  area  of  the  rectangle  enclosed  by  dashed  lines  in 
Figure  3.1(a).  The  difference  between  the  new  cost  (with  respect  to  £)  and  the  old 


65 


cost  (with  respect  to  I)  can  now  be  bounded  as  follows: 


EW«*)/,* 


c(d  =  e  /«•(«/;)  -w:)) 

e&E 

<  E  We)f, 
e£E 

=  C{f). 


In  other  words,  evaluating  /*  with  latency  functions  t  (rather  than  i)  increases  its 
cost  by  at  most  an  additive  C(f)  factor. 

On  the  other  hand,  if  /o  denotes  the  zero  flow  in  G,  then  by  construction  £p(fo)  > 
Li(f)  for  any  path  P  e  Vt.  Since  Ie  is  nondecreasing  for  each  edge  e,  it  follows  that 
£p(f*)  >  Lflf )  for  each  path  P  e  V,.  Thus,  the  cost  of  /*  with  respect  to  I  can  be 
bounded  below  in  the  following  manner: 


E  W’)fp 


>  E  E  u (f)fp 

i  PeVi 

=  E  ZLiiftr, 

i 

=  2  C(f). 


Combining  these  two  results  we  obtain  the  theorem: 

c(n  >  E  (p(r)fp-c(f) 

p 

>  2 C(f)-C(f) 

=  C(f). 


The  same  proof  also  shows  the  following  more  general  result. 

Theorem  3.6.2  If  f  is  a  flow  at  Nash  equilibrium  for  (G,r,£)  and  f*  is  feasible 
for  (G,  (1  +  Or,  £),  then  C(f)  <  \C(f*). 

Remark  3.6.3  Referring  back  to  the  network  of  Subsection  2.4.4  (the  network  with 
two  nodes  and  two  edges  with  latency  functions  £{x)  =  1  and  £(x)  =  xp ),  we  see 
that  Theorem  3.6.2  is  essentially  tight  for  all  values  of  f.  More  precisely,  by  taking 
p  sufficiently  large  we  can  obtain  an  instance  admitting  an  optimal  flow  feasible  for 
a  traffic  rate  arbitrarily  close  to  (1  +  0  with  cost  strictly  less  than  f  (recall  the  cost 
of  the  flow  at  Nash  equilibrium  for  the  original  rate  r  =  1  is  1)  and  an  optimal  flow 
feasible  for  rate  1  +  f  with  cost  arbitrarily  close  to  f. 
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Theorem  3.6.1  has  a  natural  interpretation  for  networks  with  the  M/M/1  delay 
functions  mentioned  in  the  previous  section.  To  see  this,  we  first  note  that  comparing 
a  Nash  flow  to  an  optimal  flow  forced  to  route  more  traffic  is  the  same  as  comparing  a 
Nash  flow  with  “faster”  latency  functions  to  an  optimal  flow  in  the  original  network. 
Formally,  we  have  the  following  corollary  of  Theorem  3.6.1. 

Corollary  3.6.4  Let  (G,r,£)  be  an  instance  and  define  the  modified  latency  func¬ 
tion  £e  by  £e(x)  =  /te(f )  for  each  edge  e.  If  f  is  a  flow  at  Nash  equilibrium  for 
(G,r,£)  and  f*  is  feasible  for  (G,r,£),  then  the  cost  of  f  (with  respect  to  latency 
functions  l)  is  at  most  the  cost  of  f*  (with  respect  to  latency  functions  £). 

Proof.  Let  /  be  a  Nash  flow  for  (G,r/ 2,£)  and  f*  a  flow  feasible  for  (■ G,r,£ );  by 
Theorem  3.6.1,  X/  G  (.//).//  <  Xe  X (//)//•  Now  consider  the  flow  /  =  2/,  viewed 
as  a  feasible  flow  for  (G,  r,  £).  Since  £e(fe )  =  \£e(fe)  f°r  each  edge  e  and  /  is  a  Nash 
flow  for  ( G,r,£ ),  /  is  a  Nash  flow  for  (G,r,  £)]  moreover, 

Y.We)fe  =  £(!«/,))(  2fe)  =  £(.(/.)/,■ 

We  have  shown  that  Xe^e(/e)/e  <  Xe  X (//).//  with  /  at  Nash  equilibrium  for 
(G,  r,  £);  the  corollary  now  follows  from  the  essential  uniqueness  of  Nash  flows  (Corol¬ 
lary  2.5.3).  ■ 

Now  consider  the  special  case  of  an  instance  (G,  r,  £)  in  which  all  latency  functions 
are  the  delay  functions  of  M/M/1  queues,  and  thus  for  each  edge  e  we  have  £e(x)  = 
(ue  —  x)1  on  [0,ue)  for  some  edge  capacity  ue.  Assume  for  simplicity  that  every 
flow  /  feasible  for  (G,  r,  £)  satisfies  fe  <  ue  for  each  edge  e  (e.g.,  by  insisting  that  the 
minimum  edge  capacity  is  larger  than  the  sum  of  all  traffic  rates),  so  that  we  can 
ignore  the  restricted  domains  of  the  latency  functions.  Here,  if  £e(x)  =  1  j{ue  —  x) 
then  £e{x)  =  l/2(ue  —  x/2)  =  1/(2 ue  —  x).  Thus  in  a  network  with  M/M/1  delay 
functions,  Corollary  3.6.4  offers  the  following  advice:  to  match  the  performance  of  a 
centrally  controlled  network  with  selfish  routing,  simply  double  the  capacity  of  every 
edge. 

Remark  3.6.5  We  made  the  strong  assumption  that  every  feasible  flow  /  satisfies 
fe  <  ue  for  each  edge  e  for  simplicity.  We  can  alternatively  define  the  latency 
of  an  edge  with  an  M/M/1  delay  function  and  capacity  u  to  be  +oo  on  [u,  oo); 
arithmetic  with  +oo  is  defined  in  the  usual  way.  Some  care  must  be  taken  with  this 
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approach,  however,  as  the  essential  uniqueness  of  Nash  flows  (Proposition  2.5.1  and 
Corollary  2.5.3)  fails  when  edge  latencies  can  take  on  +oo  as  a  value;  in  particular, 
some  but  not  all  Nash  flows  may  have  infinite  cost.  The  guarantee  of  Theorem  3.6.1 
also  fails  for  networks  in  which  Nash  flows  may  have  infinite  cost.  What  remains 
true  is  this:  doubling  the  capacity  of  a  network  with  M/M/1  delay  functions  is 
guaranteed  to  offset  the  cost  of  selfish  routing,  assuming  only  that  all  Nash  flows 
in  the  augmented  network  possess  finite  cost.  This  significantly  weakens  our  earlier 
assumption  that  all  feasible  flows  have  finite  cost  in  the  original  network. 


Chapter  4 

Extensions  to  Other  Models 


In  Chapter  3  we  studied  the  price  of  anarchy  in  the  network  model  put  forth  in 
Chapter  2.  In  this  chapter  we  show  how  this  work  can  be  extended  both  to  more 
realistic  models  of  network  routing  and  to  a  broader  class  of  games.  We  do  not 
endeavor  to  generalize  all  of  the  results  of  the  previous  chapter  to  the  greatest 
possible  extent;  we  merely  wish  to  point  out  that  our  techniques  are  not  entirely 
model-specific,  and  to  indicate  some  directions  in  which  they  are  readily  extended. 

The  extensions  presented  in  the  first  three  sections  of  this  chapter  are  motivated 
by  some  of  the  deficiencies  of  the  traffic  model  defined  in  Chapter  2.  First,  net¬ 
work  users  can  often  only  evaluate  path  latency  approximately,  rather  than  exactly. 
Section  4.1  extends  the  notion  of  a  flow  at  Nash  equilibrium  and  Theorems  3.2.6 
and  3.6.1  to  this  setting.  Second,  our  basic  model  represents  an  idealized  scenario 
with  infinitely  many  users  each  controlling  a  negligible  fraction  of  the  overall  traffic, 
while  in  reality  we  encounter  a  finite  number  of  network  users,  each  controlling  a 
strictly  positive  amount  of  traffic.  In  Section  4.2  we  prove  an  analogue  of  Theo¬ 
rem  3.6.1  for  the  case  of  finitely  many  network  users,  provided  each  user  can  route 
its  flow  fractionally  over  any  number  of  paths.  In  Section  4.3  we  show  that  such  an 
assumption  is  essentially  necessary,  in  that  no  bicriteria  bound  analogous  to  The¬ 
orem  3.6.1  holds  when  there  are  only  finitely  many  network  users,  each  of  whom 
must  route  its  flow  on  a  single  path;  however,  a  version  of  Theorem  3.6.1  does  hold 
if  network  users  do  not  control  too  much  flow  and  the  edge  latency  functions  are  not 
too  steep.  In  the  last  section  (Section  4.4),  we  show  how  all  of  the  results  of  Chap¬ 
ter  3  can  be  extended  to  a  broad  class  of  games  (that  need  not  involve  a  network) 
previously  studied  in  the  game  theory  literature. 
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4.1  Flows  at  Approximate  Nash  Equilibrium 

It  is  often  unreasonable  to  expect  network  users  to  be  able  to  evaluate  the  latency 
of  different  paths  with  arbitrary  precision.  We  next  investigate  the  sensitivity  of  our 
results  to  this  assumption.  We  suppose  that  a  network  user  can  only  distinguish 
between  paths  that  differ  significantly  in  their  latency  (say  by  more  than  a  (1  +  e) 
factor  for  some  e  >  0).  Our  definition  of  a  flow  at  e- approximate  Nash  equilibrium 
is  then  an  obvious  modification  of  Definition  2.2.1: 

Definition  4.1.1  A  flow  /  feasible  for  instance  (G,r,£)  is  at  e-approximate  Nash 
equilibrium  if  for  all  i  E  {1, .  . . ,  k},  Pi,  P2  G  Vi,  and  5  E  (0,  /pj,  we  have  tpi(f)  < 
(1  +  e)£p2(f),  where 

{  fP- 5  if  P  =  Pi 

fP=l  fp  +  5  if  P  =  P2 

[  fp  ifP^{Pi,P2}. 

The  analogue  of  Proposition  2.2.2  is  then: 

Lemma  4.1.2  A  flow  f  is  at  e-approximate  Nash  equilibrium  if  and  only  if  for 
every  i  E  (1, . . . ,  k}  and  Pi,  P2  E  T,  with  fPl  >  0,  tpiif)  <  (1  +  eKp2(/)- 

The  next  theorem  provides  an  analogue  of  Theorem  3.6.1  for  flows  at  e-approx¬ 
imate  Nash  equilibrium. 


Theorem  4.1.3  If  f  is  at  e-approximate  Nash  equilibrium  with  e  <  1  for  ( G,r,£ ) 
and  f*  is  feasible  for  (G,  2r,  £),  then  C(f)  <  jf^C(f*). 

Proof.  Suppose  /,  f*  satisfy  the  hypotheses  of  the  theorem.  For  i  =  1, . . . ,  k,  let 
Lj(/)  be  the  minimum  latency  of  any  Sj-t;  path  (with  respect  to  /);  since  /  is  at  e- 
approximate  Nash  equilibrium,  every  Sj-f*  flow  path  has  latency  at  most  (l  +  e)Lj(/) 
and  hence  C(f)  <  (1  +  e)  E,:  Lflffri. 

As  in  the  proof  of  Theorem  3.6.1,  we  define  a  new  set  of  latency  functions  i  by 


4(/e)  if  x<  fe 

£e{x)  if  x  >  fe. 


As  before,  the  cost  of  a  flow  with  respect  to  i  exceeds  its  cost  with  respect  to  £  by 
at  most  an  additive  factor  of  C(f). 
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Letting  f0  denote  the  zero  flow  in  G ,  we  have  £p(fo)  >  Lj(/)  for  any  path  P  e  Vt. 
Since  £e  is  nondecreasing  for  each  edge  e,  it  follows  that  £p(f*)  >  Lj(/)  for  each 
path  P  G  Vt.  This  allows  us  to  bound  the  cost  of  f*  with  respect  to  £  from  below: 


£M/*)/p  >  £  £  W)fp 

P  i  PeVi 

=  £2ii(/)ri 


To  conclude,  we  derive 


> 


1+6 


cm 


c(n  >  EMnfp-cif) 

p 

>  —  Ctf)-C(f) 


1  +  6 
1  —  6 

1+6 


c(f). 


Remark  4.1.4  A  simple  example  in  a  network  similar  to  that  of  Braess’s  Paradox 
(Figure  1.2(b))  shows  that  the  factor  of  yzf  cannot  be  improved  (see  Section  B.2). 
However,  it  is  not  difficult  to  improve  this  factor  to  1  +  e  in  networks  of  parallel 
links  (see  Section  B.2);  this  provides  a  counterpoint  to  our  work  in  Sections  3. 3-3. 4 
showing  that  worst-case  examples  for  the  price  of  anarchy  for  flows  at  exact  Nash 
equilibrium  always  occur  in  networks  of  parallel  links. 

Further  extensions  of  the  theorems  of  Chapter  3  to  the  current  setting  are  possi¬ 
ble.  As  an  example,  we  sketch  an  approximate  version  of  Theorem  3.2.6  for  networks 
with  linear  latency  functions.  As  in  Section  3.2,  the  idea  is  to  start  with  a  flow  /  at 
e-approximate  Nash  equilibrium  and  consider  the  scaled-down  flow  f/2.  The  claim 
C(f  /  2)  >  \C(f)  holds  as  in  Section  3.2,  but  now  f/2  is  only  approximately  optimal 
for  (G,r/ 2,£);  because  of  this,  proving  that  an  augmentation  from  f/2  to  a  flow 
feasible  for  (G,  r,  £)  is  costly  will  require  a  bit  of  care. 

Next,  let  Lj(/)  denote  the  minimum  latency  of  any  Sj-tj  path  with  respect  to  /. 
The  following  are  true: 

(1)  C(/)<(l+e)E?.,ii(/)ri. 

(2)  If  P  is  an  Si-ti  path,  then  the  marginal  cost  of  P  with  respect  to  f/2  is  at 
least  Li(f). 
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(3)  If  P  is  an  Si-ti  flow  path  of  /,  then  the  marginal  cost  of  P  with  respect  to  //2 
is  at  most  (1  +  e)Lj(/). 

Now  consider  augmenting  // 2  to  a  flow  f*  optimal  for  (G,  r,  £).  As  in  Lemmas  3.2.4 
and  3.3.5,  convexity  of  the  objective  function  C(-)  with  linear  (or  more  generally, 
standard)  latency  functions  allows  us  to  lower  bound  the  cost  of  this  augmentation 
on  each  edge  by  the  change  in  flow  value  times  the  marginal  cost  with  respect  to  f/2. 
At  worst,  this  augmentation  will  remove  r*/ 2  units  of  flow  between  each  commodity 
i  at  a  marginal  benefit  of  (1  +  e)Lflf)  per  flow  unit  and  will  add  rt  units  of  flow  at 
a  marginal  cost  of  Lflf)  per  flow  unit.  This  argument  gives 

C(/‘)  >  jC(/)-(l  +  e)  !><(/)£ +  EW< 

i=l  i= 1 

=  jcaj  +  fixijEntUn 

=  -cm. 

4  +  4e 

A  straightforward  modification  of  the  bad  example  of  Section  B.2  for  Theorem  4.1.3 
shows  that  the  factor  is  best  possible  for  e  <  1. 

4.2  Finitely  Many  Users:  Splittable  Flow 

Our  basic  model  makes  the  convenient  assumption  that  there  are  an  infinite  number 
of  noncooperative  network  users,  each  controlling  a  negligible  fraction  of  the  overall 
traffic.  In  this  section  we  extend  the  basic  model  to  the  case  of  finitely  many  network 
users,  each  of  whom  controls  a  strictly  positive  amount  of  traffic.  In  this  section  we 
allow  a  network  user  to  split  flow  among  any  number  of  paths;  this  model  has  been 
studied  extensively  in  the  networking  literature  [4,  5,  8,  25,  59,  138].  In  the  next 
section  we  will  investigate  the  setting  in  which  each  network  user  must  route  all  of 
its  flow  on  a  single  path. 

We  are  given  a  network  G  with  continuous  nondecreasing  latency  functions  i  as 
before,  and  in  addition  k  users.  We  assume  that  user  i  intends  to  send  r,  units  of  flow 
from  source  Sj  to  destination  t,t.  Distinct  users  may  have  identical  source-destination 
pairs.  We  continue  to  denote  an  instance  by  (G,r,£),  and  we  call  the  instance  finite 
splittable.  A  flow  f  now  consists  of  k  functions,  with  one  function  /W  :  V,  — >  1Z+  for 
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each  user  i.  For  a  flow  /,  we  denote  by  Ci(f)  the  total  latency  experienced  by  user  i; 
thus,  Ci(f)  =  J2peVi  7p(/)/p) •  As  usual,  a  flow  is  at  Nash  equilibrium,  if  no  user  can 
decrease  the  latency  it  experiences  by  rerouting  its  flow.  In  this  setting,  a  flow  /  is 
at  Nash  equilibrium  if  and  only  if  for  each  i,  /W  minimizes  Ci(f )  given  for  j  ^  i. 
We  will  focus  on  networks  with  standard  latency  functions  (see  Definition  2.3.5); 
under  this  assumption,  results  of  Rosen  [156]  imply  that  a  flow  at  Nash  equilibrium 
must  exist  (see  [138]  for  a  proof  sketch). 

Our  main  result  for  this  model  is  an  analogue  of  Theorem  3.6.1. 

Theorem  4.2.1  If  f  is  at  Nash  equilibrium  for  the  finite  splittable  instance  ( G ,  r,  I) 
with  standard  latency  functions  and  f*  is  feasible  for  the  finite  splittable  instance 
(G,2r,4  then  C(f)<C(f*). 

Proof.  Fix  /,  f*  and  define  latency  functions  I  as  in  the  proofs  of  Theorems  3.6.1 
and  4.1.3.  As  in  those  proofs,  evaluating  /*  with  latency  functions  I  (rather  than 
£)  increases  its  cost  by  at  most  an  additive  C(f)  factor. 

We  claim  that  /  is  optimal  for  the  instance  (G,r,£).  We  proceed  by  contradic¬ 
tion,  showing  that  if  /  is  not  optimal  for  ( G,r,£ )  then  /  fails  to  be  at  Nash  equi¬ 
librium  for  (G,r,£).  Suppose  /  is  not  optimal;  since  the  instance  (G,r,£)  defines 
a  convex  optimization  problem  of  the  form  (NLP)  (see  Section  2.3),  by  Proposi¬ 
tion  2.3.1  there  are  two  paths  Pi,P2,  a  user  i  such  that  Pi,P2  G  V,  with  fpj  >  0, 
and  a  sufficiently  small  5  E  (0,  fpj]  such  that  moving  5  units  of  flow  from  Px  to 
P2  yields  a  new  flow  with  cost  (with  respect  to  £)  strictly  less  than  that  of  /.  Our 
goal  is  to  show  that  the  same  local  move  will  be  beneficial  for  user  i  in  the  instance 
(G,  r,  £).  We  may  assume  that  Pi,  P2  are  disjoint  (otherwise,  the  following  argument 
applies  to  the  symmetric  difference  of  Pi  and  P2).  The  benefit  (with  respect  to 
£)  of  removing  S  units  of  flow  from  path  Pi  is  then  S  ■  £p1(f )  =  5  ■  £p1(f)  (since 
£e(x)  =  £e(fe)  when  x  <  fe )  while  the  cost  (with  respect  to  £)  of  adding  5  units  of 
flow  to  P2  is  JfeeP2\^e(fe  +  5)(/e  +  4)  —  £e(fe)fe\]  we  are  assuming  that  the  former 
exceeds  the  latter.  On  the  other  hand,  user  i  is  capable  of  making  an  identical  local 
change  to  in  the  instance  (G,  r,£),  and  doing  so  provides  a  benefit  to  user  i  of 
at  least  5  ■  £p1(f )  with  respect  to  £  (since  latency  functions  are  nondecreasing)  and 
a  cost  (with  respect  to  £)  of 

y  [W, + wf +s)~ 

eSP2 
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which  is  at  most 

£  [<,(/«  +  We  + «)  -  <«(/,)/,] 

eeP2 

since  £e  is  nondecreasing  and  /b)  <  fe  for  each  edge  e.  Thus,  moving  <5  units  of  flow 
from  path  P1  to  path  P2  yields  a  better  outcome  for  user  i  in  the  instance  (G,r,£), 
so  /  fails  to  be  at  Nash  equilibrium  for  ( G,r,£ ). 

We  have  determined  that  any  flow  feasible  for  (G,  r,  £)  must  have  cost  at  least 
C(f).  Since  every  latency  function  is  nondecreasing,  it  follows  that  any  flow  feasible 
for  (G,  2r,  £)  must  have  cost  at  least  2 C(f):  such  a  flow  may  be  expressed  as  the 
sum  of  two  flows  feasible  for  (G,  r,  £),  and  the  cost  of  their  sum  is  at  least  the  sum 
of  their  individual  costs.  Since  the  cost  of  /*  with  respect  to  £  exceeds  its  cost  with 
respect  to  £  by  at  most  C(f),  the  theorem  follows.  ■ 

Theorem  3.6.1  can  be  regarded  as  the  limiting  case  of  the  above  theorem,  as  the 
number  of  users  tends  to  infinity  and  the  amount  of  flow  controlled  by  each  user 
tends  to  0. 


4.3  Finitely  Many  Users:  Unsplittable  Flow 

In  this  section  we  continue  our  investigation  of  selfish  routing  with  finitely  many 
users,  each  controlling  a  non- negligible  amount  of  flow.  It  is  easy  to  imagine  sce¬ 
narios  in  which  users  cannot  route  flow  on  several  different  paths,  but  must  instead 
select  a  single  path  for  routing.  Our  previous  results  have  made  crucial  use  of 
the  “infinitely  divisible”  nature  of  flow,  and  we  next  show  that  this  assumption  is 
essentially  necessary. 

Consider  an  instance  (G,  r,  £)  as  in  the  previous  section  (with  k  users  and  the  ith 
user  controlling  r%  units  of  flow),  but  with  the  additional  constraint  that  each  user 
selects  a  single  path  on  which  to  route  all  of  its  flow.  We  call  such  an  instance  finite 
unsplittable ;  such  instances  have  also  been  studied  by  Libman  and  Orda  [116,  117] 
(though  not  from  the  perspective  of  quantifying  the  inefficiency  of  Nash  equilibria). 
Adapting  the  definition  of  the  previous  section  to  this  new  setting,  a  flow  /  (now 
consisting  only  of  k  paths)  is  at  Nash  equilibrium  if  and  only  if  for  each  i,  user  i 
routes  its  flow  on  a  path  minimizing  £p(f)  (with  P  ranging  over  all  paths  in  Vfi, 
given  the  paths  chosen  by  the  other  k  —  1  users.  One  technical  difficulty  of  this  model 
is  that  Nash  equilibria  need  not  exist  (unless  network  users  can  randomize)  [117]; 
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Figure  4.1:  A  Bad  Example  for  Unsplittable  Flow 


we  will  see  below  that  even  when  they  do,  they  may  be  inefficient  in  a  very  strong 
sense. 

We  first  consider  a  simple  example  showing  that  a  flow  at  Nash  equilibrium  may 
have  cost  arbitrarily  larger  than  that  of  an  optimal  flow.  Consider  the  network 
shown  in  Figure  4.1,  and  suppose  there  are  two  users,  each  of  whom  has  source 
s,  destination  t,  and  one  unit  of  flow  to  send;  e  >  0  is  arbitrary.  In  the  optimal 
solution,  one  user  chooses  path  s  — >  v  — >  t  and  the  other  s  — »  w  — >  f;  the  cost  of 
this  solution  is  less  than  4  (for  any  e  >  0).  On  the  other  hand,  a  solution  with  one 
user  choosing  path  s  — >  v  — >  w  — >  t  and  the  other  routing  on  the  s  t,  link  is  a 
flow  at  Nash  equilibrium  with  cost  greater  than  A  by  choosing  e  arbitrarily  small 
this  cost  is  arbitrarily  large,  and  hence  arbitrarily  more  costly  than  optimal. 

In  light  of  the  example  at  the  beginning  of  Section  3.6,  such  a  result  is  hardly 
surprising;  however,  we  can  extend  this  example  to  show  that  bicriteria  bounds 
analogous  to  Theorems  3.6.1  and  4.2.1  are  false  when  we  require  users  to  route  flow 
unsplittably.  For  a  positive  integer  q,  consider  the  network  Gq  consisting  of  2q  +  2 
vertices  arranged  in  a  path  s,  vi,v2,  ■  ■  ■ ,  v2q,  t  with  edges  along  the  path  alternately 
having  latency  functions  £{x)  =  IJ+1+C_X  and  £(x)  =  0,  a  direct  s-t  link  with  constant 
latency  function  £{x)  =  j,  and  edges  from  s  to  v2i  and  from  v2i-\  to  t  with  constant 
latency  function  i(x)  =  q;  this  construction  produces  the  network  of  Figure  4.1 
when  q  —  1.  Suppose  there  are  q  +  1  users,  each  with  1  unit  of  flow  to  send  from  s 
to  t.  Analogous  to  the  previous  paragraph,  one  Nash  equilibrium  consists  of  q  users 
routing  flow  on  the  long  path  s  — >  iq  — >  v2  — > ►  •  •  •  — ►  v2q  — >  t  and  the  final  user 
routing  its  flow  on  the  direct  s-t  link.  This  Nash  equilibrium  has  total  latency  at 
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least  K  On  the  other  hand,  for  any  e  >  0  it  is  possible  for  each  of  the  q  +  1  users  to 
route  q  units  of  flow  unsplittably  through  Gq  with  total  cost  at  most  (2q  +  l)(q+l)2: 
the  first  user  routes  on  the  path  s  — >  iq  — >  t,  the  last  on  s  — >  v2q  —>  t,  and  otherwise 
the  ith  user  routes  on  the  path  s  — >  v2i- 2  — > ►  u2j_i  — » ►  t.  Letting  e  tend  to  0  for 
each  fixed  value  of  q,  we  see  that  an  optimal  flow  can  send  arbitrarily  more  flow  at 
arbitrarily  less  cost  than  a  flow  at  Nash  equilibrium. 

In  the  above  bad  example,  the  network  has  latency  functions  with  unbounded 
derivatives;  in  this  situation,  routing  a  strictly  positive  amount  of  additional  flow 
on  an  edge  may  increase  the  latency  of  that  edge  by  an  arbitrarily  large  amount. 
We  note  that  this  example  is  not  “pathological” ,  in  the  sense  that  latency  functions 
of  the  form  £{x)  =  1  /(u  —  x)  naturally  arise  in  networking  applications  [20]  (see 
also  Section  3.5).  However,  in  networks  where  the  largest  possible  change  in  edge 
latency  resulting  from  a  single  user  rerouting  its  flow  is  not  too  large,  we  can  apply 
Theorem  4.1.3  to  derive  the  following. 

Theorem  4.3.1  Suppose  f  is  at  Nash  equilibrium  in  the  finite  unsplittable  instance 
(G,  r,  £),  and  for  some  £  <  2,  we  have  £e(x+rf)  <  f-£e(x)  for  all  users  i  €  {1, . . . ,  k}, 
edges  e  G  E,  and  x  G  [0,  Lf?/*  f'f\ ■  Then  for  any  flow  f*  feasible  for  (G,2r,£), 
C(/)<Xj-C(/*). 

Proof.  We  may  interpret  /  and  /*  as  (fractional)  flows  feasible  for  instances  (G,  r',  £) 
and  ( G,2r',£ )  of  the  usual  fractional  type  (in  the  sense  of  Section  2.1),  where  r[  is 
the  total  amount  of  flow  controlled  by  users  with  source  Si  and  destination  tt  in 
the  original  instance.  The  hypotheses  ensure  that  /  is  at  (£  —  l)-approximate  Nash 
equilibrium  for  (■ G,r',£ ),  so  the  result  follows  from  Theorem  4.1.3.  ■ 

For  example,  in  an  instance  with  linear  latency  functions  (say  £e(x)  =  aex  +  be ) 
with  be  >  0  for  all  edges  e,  we  may  apply  Theorem  4.3.1  with  £  =  1  +  max,;  rt  ■ 
maxe  ae/be. 

4.4  Nonatomic  Congestion  Games 

Both  the  traffic  model  of  Chapter  2  and  the  theorems  of  Chapter  3  can  be  recast  in 
a  more  general  and  abstract  setting;  in  this  section,  we  pursue  this  generalization 
and  explain  its  connections  to  recent  work  by  the  game  theory  community.  Because 
of  the  similarity  to  the  definitions  and  results  of  Chapters  2  and  3,  our  development 
will  be  brief. 
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4.4.1  Definitions 

A  nonatomic  congestion  game 1  (NCG)  is  defined  by:  a  finite  set  E  of  resources, 
each  possessing  a  nonnegative,  nondecreasing,  and  continuous  cost  function  te\  a 
finite  number  k  of  player  types ;  and  for  each  player  type  i,  a  positive  real  number  rt 
describing  the  amount  of  players  of  type  i  and  a  (finite)  set  St  C  2E\{0}  of  strategies. 
By  an  assignment  (of  players  to  strategies)  for  a  NCG,  we  mean  k  functions  fl  :  S%  — > 
1Z+  satisfying  JfseSi  fs  —  ri  for  i  —  1,  2,  :. . . ,  b,  we  will  also  denote  the  assignment 
by  /.  We  define  fe  to  be  the  amount  of  resource  e  consumed  by  the 
assignment  /,  namely 

k 

/e  =  E  E  E 

i= 1  S£Si:e£S 

Remark  4.4.1  It  is  easy  to  see  that  the  traffic  routing  model  of  Chapter  2  is 
a  nonatomic  congestion  game,  with  network  edges  as  resources  and  commodities 
defining  player  types  with  strategy  sets  equal  to  collections  of  source-destination 
paths.  On  the  other  hand,  nonatomic  congestion  games  are  more  general  than  the 
traffic  routing  model:  the  strategy  sets  St  are  not  assumed  to  possess  any  special 
structure  (such  as  that  enjoyed  by  paths  with  a  common  source  and  destination) 
and  are  not  assumed  to  be  disjoint  for  different  player  types. 

The  cost  £s(f)  of  a  strategy  S  with  respect  to  an  assignment  /  is  J2ees^e  (/«); 
the  social  cost  C(f )  of  an  assignment  is 

c(s)  =  E  E  W)fb  =  E  W,)/«. 

i=  1  SeSi  e&E 

We  will  call  an  assignment  minimizing  C(-)  min-cost  or  optimal.  Our  final  definition 
extends  Definition  2.2.1  of  flows  at  Nash  equilibrium  to  NCGs. 

Definition  4.4.2  An  assignment  for  a  NCG  is  at  Nash  equilibrium  (or  is  a  Nash 
assignment)  if  for  all  i  e  {1, . . . ,  k},  Si,  S2  G  S,  with  >  0,  and  5  G  (0,  f  ls J,  we 
have  tsi(f)  <  ? s2(f ),  where 

f  ff  —  S  if  j  =  i  and  S  —  Si 

fs  =  l  fs  +  8  if  j  =  *  and  S  =  S2 

I  fs  otherwise. 


1See  Subsection  4.4.2  below  for  an  explanation  of  this  terminology. 
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4.4.2  Related  Work 

Many  types  of  games  have  been  studied  under  the  moniker  of  “congestion  games”  in 
the  game  theory  literature;  the  salient  feature  of  all  such  games  is  that  the  payoff  to 
a  player  depends  only  on  the  player’s  strategy  and  on  the  number  of  other  players 
choosing  the  same  or  some  “interfering”  strategy.  Rosenthal  [157,  158]  was  the  first 
to  describe  how  the  traffic  model  described  in  Chapter  2  can  be  naturally  general¬ 
ized  to  a  more  abstract  setting,  and  introduced  the  name  “congestion  game”;  our 
model  is  in  many  respects  inspired  by  his.  However,  Rosenthal  studied  games  with 
a  finite  number  of  discrete  (or  atomic )  players  restricted  to  playing  pure  strategies, 
a  requirement  that  in  our  notation  corresponds  to  insisting  that  each  function  fl 
assigns  a  value  of  rt  to  one  strategy  and  a  value  of  0  to  all  other  strategies.  Rosen¬ 
thal  [157]  used  a  discretized  version  of  the  proof  of  Proposition  2.5.1  to  show  the 
existence  of  a  Nash  equilibrium  (in  pure  strategies)  provided  all  rf s  are  equal,  and 
exhibited  a  game  violating  this  hypothesis  with  no  Nash  equilibrium. 

More  recently,  Rosenthal’s  work  has  been  extended  in  several  different  directions. 
Monderer  and  Shapley  [127]  introduced  a  class  of  atomic  games  they  call  potential 
games,  which  by  definition  are  games  for  which  a  Nash  equilibrium  in  pure  strategies 
arises  as  the  optimum  solution  to  a  related  optimization  problem  (the  objective 
function  of  which  they  call  a  potential  function ).  Potential  games  strictly  generalize 
Rosenthal’s  congestion  games  and  have  since  been  studied  for  their  own  sake  [62, 
110,  111,  181,  185].  Holzman  and  Law-Yone  [88]  studied  necessary  and  sufficient 
conditions  for  a  congestion  game  (in  the  sense  of  Rosenthal  [157])  to  possess  a 
Nash  equilibrium  in  pure  strategies  with  certain  “nice”  properties  (such  as  Pareto- 
optimality).  Several  authors  considered  congestion  games  in  which  all  strategies 
are  single  resources  (rather  than  subsets  of  resources)  but  in  which  different  player 
types  experience  different  amounts  of  congestion  [103,  123,  149]  and  gave  sufficient 
conditions  for  the  existence  of  Nash  equilibria  (as  well  as  “nice”  Nash  equilibria) 
in  pure  strategies;  see  also  Voorneveld  et  al.  [185]  for  a  survey  of  this  work.  In 
addition,  many  researchers  have  studied  various  nonatomic  versions  of  congestion 
games,  wherein  the  number  of  players  is  assumed  to  be  so  large  that  an  individual 
has  negligible  effect  on  the  outcome  of  the  game;  see  [121,  151,  166]  for  foundational 
work  on  nonatomic  games  and  [22,  73,  124,  126]  for  work  on  nonatomic  congestion 
games  in  particular.  The  nonatomic  setting  is  the  one  to  which  the  techniques  of 
Chapter  3  naturally  generalize,  and  is  the  object  of  our  study  for  the  rest  of  this 
section. 
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Despite  the  growing  body  of  research  on  congestion  games  and  their  variants, 
our  work  is  the  first  to  quantify  the  inefficiency  of  Nash  equilibria  in  such  games. 

4.4.3  Bounding  the  Inefficiency  of  Nash  Equilibria  in  NCGs 

We  now  observe  that  all  of  the  results  of  Chapter  3  carry  over  (with  the  same 
proofs)  to  the  more  general  setting  of  NCGs.  To  see  this,  we  first  note  that  the 
optimization  problem  of  computing  an  optimal  assignment  can  be  modeled  as  a 
convex  program  identical  to  the  program  (NLP)  of  Section  2.3;  because  of  this,  all 
of  the  key  propositions  of  Sections  2.2,  2.3,  and  2.5  extend  to  the  setting  of  NCGs 
without  difficulty.  For  example,  we  have  the  following  analogues  of  Proposition  2.2.2 
and  Corollary  2.3.2:  an  assignment  /  for  a  NCG  is  at  Nash  equilibrium  if  and  only 
if  for  every  i  G  {1,  — ,  kj  and  Si,  S2  G  St  with  fs1  >  0,  is^f)  <  t s2(/);  in  a  NCG 
with  standard  cost  functions  (in  the  sense  of  Definition  2.3.5),  an  assignment  /  is 
optimal  if  and  only  if  it  is  at  Nash  equilibrium  with  respect  to  cost  functions  l* 
(defined  as  in  Section  2.3). 

In  addition,  careful  scrutiny  of  the  proofs  of  Chapter  3  reveals  that  the  com¬ 
binatorial  structure  of  the  underlying  network  was  never  used.  Because  of  this, 
these  proofs  extend,  mutatis  mutandis,  to  the  more  general  setting  of  NCGs.  We 
summarize  (somewhat  informally)  the  main  consequences  below. 

Extension  of  Theorem  3.3.8:  Let  £  be  a  class  of  standard  cost  functions,  with 
anarchy  value  a(£)  defined  as  in  Definitions  3.3.2  and  3.3.3.  In  a  NCG  with  cost 
functions  in  £,  the  social  cost  of  an  assignment  at  Nash  equilibrium  is  most  a(£) 
times  that  of  an  optimal  assignment. 

Extension  of  Theorem  3-4-2:  Let  £  be  a  class  of  standard  cost  functions  containing 
the  constant  functions.  Then  the  worst-case  ratio  between  the  social  cost  of  Nash 
and  optimal  assignments  in  NCGs  with  cost  functions  in  £  is  achieved  (up  to  an 
arbitrarily  small  additive  factor)  by  NCGs  with  two  resources  and  one  player  type 
with  only  singleton  strategies. 

Extension  of  Theorem  3-4 -4  ■'  Let  £  be  a  class  of  standard  cost  functions  that  is 
diverse  in  the  sense  that  {£(0)  :  t  G  £}  =  (0,  00).  Then  the  worst-case  ratio  between 
the  social  cost  of  Nash  and  optimal  assignments  in  NCGs  with  cost  functions  in  £  is 
achieved  (up  to  an  arbitrarily  small  additive  factor)  by  NCGs  with  one  player  type 
with  only  singleton  strategies. 
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Extension  of  Theorem  3.6.1:  The  social  cost  of  an  assignment  at  Nash  equilibrium 
for  a  NCG  with  arbitrary  (continuous,  nondecreasing)  cost  functions  is  at  most  the 
social  cost  of  an  optimal  assignment  for  the  same  NCG  with  twice  as  many  players 
of  each  type. 

Remark  4.4.3  Further  extensions  are  possible.  For  example,  we  can  associate  a 
positive  real  number  alSe  to  each  resource  e  contained  in  strategy  S  of  St  that 
represents  in  some  sense  the  “rate  of  consumption”  of  resource  e  by  players  of  type 
i  selecting  strategy  S.  In  this  new  setting,  the  total  consumption  fe  of  a  resource 
e  by  an  assignment  /  is  defined  to  be  fe  =  Z)f=1  J2seSi  aSefs-  Since  we  expect  a 
player  requiring  large  (respectively,  small)  amounts  of  a  particular  resource  to  be 
correspondingly  sensitive  (respectively,  insensitive)  to  the  cost  of  that  resource,  we 
define  the  cost  £s(f)  °f  strategy  S  G  St  with  respect  to  assignment  /  by  £s(f)  — 
as,e^e(/e)-  For  example,  if  resources  are  types  of  food  and  strategies  are  dinner 
possibilities,  rates  of  consumption  correspond  to  amounts  of  different  ingredients 
needed  and  the  cost  £s(f)  is  the  total  cost  of  raw  ingredients,  with  the  per-unit  price 
te  of  ingredient  e  a  function  of  the  overall  demand  fe.  It  is  then  straightforward  to 
adapt  the  proofs  of  Chapters  2  and  3  to  this  setting;  in  particular,  the  four  results 
above  carry  over  to  this  generalization  of  NCGs.  We  omit  further  details,  and  refer 
the  interested  reader  to  [164], 

Remark  4.4.4  The  reader  may  well  wonder  at  this  point  whether  any  conceivable 
statement  that  holds  for  the  network  model  of  Chapter  2  fails  to  also  hold  for 
NCGs.  One  argument  that  does  not  immediately  generalize  from  networks  to  NCGs 
is  provided  by  the  forthcoming  proof  of  Theorem  5.4.1,  which  bounds  the  worst- 
case  severity  of  losses  due  to  harmful  extraneous  edges — as  in  Braess’s  Paradox — in 
networks  with  general  latency  functions  and  selfish  routing.  This  proof  makes  use 
of  the  graph-theoretic  notions  of  acyclicity  and  cuts,  notions  without  analogues  in 
NCGs.  Indeed,  even  the  (best-possible)  guarantee  of  Theorem  5.4.1  is  a  function 
of  the  number  of  network  vertices;  since  NCGs  possess  the  analogue  of  an  edge  set 
(namely,  resources)  but  not  of  a  vertex  set,  it  is  unclear  what  sort  of  generalization 
of  Theorem  5.4.1  to  NCGs  can  be  hoped  for. 


Part  III 

Coping  with  Selfishness 
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Chapter  5 

Designing  Networks  for  Selfish 
Users 


Motivated  by  our  previous  examples  showing  that  flows  at  Nash  equilibrium  may 
be  quite  inefficient,  in  this  chapter  and  the  next  we  study  methods  for  coping  with 
selfishness — that  is,  for  ensuring  that  selfish  behavior  results  in  a  desirable  outcome. 
In  this  chapter,  we  explore  the  following  idea  for  ameliorating  the  degradation  in 
network  performance  due  to  selfish  routing:  armed  with  the  knowledge  that  our 
networks  will  be  host  to  selfish  users,  how  can  we  design  them  to  minimize  the 
inefficiency  inherent  in  a  user-defined  equilibrium? 

5.1  Introduction 

A  natural  measure  for  the  performance  of  a  network  host  to  selfish  users  sharing 
a  common  source  s  and  destination  t  is  the  common  latency  experienced  by  each 
user  in  a  flow  at  Nash  equilibrium  (see  Proposition  2.2.2).  Recall  that  Braess’s 
Paradox  (see  Subsections  1.2.2  and  2.4.2)  demonstrates  how  removing  edges  from 
a  network  may  improve  its  performance.  This  phenomenon  suggests  the  following 
network  design  problem:  given  a  network  with  latency  functions  on  the  edges  and  a 
traffic  rate,  which  edges  should  be  removed  to  obtain  the  best  possible  flow  at  Nash 
equilibrium?  Equivalently,  given  a  large  network  of  candidate  edges  to  build,  which 
subnetwork  will  exhibit  the  best  performance  when  used  selfishly? 
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5.1.1  Summary  of  Results 

We  give  optimal  inapproximability  results  and  approximation  algorithms  for  several 
network  design  problems  of  the  following  type:  given  a  network  with  edge  latency 
functions,  a  single  source- destination  pair,  and  a  rate  of  traffic,  find  the  subnetwork 
minimizing  the  travel  time  of  all  (selfish)  network  users  in  a  flow  at  Nash  equilibrium. 
Specifically,  we  prove  the  following  for  any  e  >  0  (assuming  P  ^  NP ): 

-  General  Latency  Network  Design:  for  networks  with  continuous, 
nonnegative,  nondecreasing  edge  latency  functions,  there  is  no  (n/ 2  —  e)- 
approximation  algorithm  for  network  design,  where  n  is  the  number  of  vertices 
in  the  network.  We  also  prove  this  hardness  result  to  be  best  possible  by  ex¬ 
hibiting  an  n/ 2- approximation  algorithm  for  the  problem. 

-  Linear  Latency  Network  Design:  for  networks  in  which  the  latency 
of  each  edge  is  a  linear  function  of  the  congestion,  there  is  no  (|  —  e)-approx- 
imation  algorithm  for  network  design.  The  existence  of  a  (^-approximation 
algorithm  follows  easily  from  our  work  bounding  the  price  of  anarchy  in  such 
networks,  proving  this  hardness  result  sharp. 

Moreover,  we  prove  that  an  optimal  approximation  algorithm  for  these  problems  is 
what  we  call  the  trivial  algorithm,-,  given  a  network  of  candidate  edges,  build  the 
entire  network.  As  a  consequence  of  the  optimality  of  the  trivial  algorithm,  we  prove 
that  inefficiency  due  to  harmful  extraneous  edges  is  impossible  to  detect  efficiently, 
even  in  worst-possible  instances. 

Finally,  we  consider  additional  classes  of  latency  functions  (such  as  polynomials 
of  bounded  degree)  and  show  that  our  strong  hardness  results  are  not  particular  to 
the  classes  of  general  and  linear  latency  functions. 

5.1.2  Related  Work 

Paradoxes 

Braess’s  Paradox  [28]  (as  described  in  Subsections  1.2.2  and  2.4.2)  has  intrigued 
researchers  ever  since  its  discovery,  appearing  frequently  in  textbooks  [40,  76,  107, 
129,  133,  169]  and  the  popular  science  literature  [10,  14,  15,  35,  101,  146].  In 
addition,  Braess’s  Paradox  has  led  to  many  further  research  developments:  it  has 
catalyzed  the  search  for  other  “paradoxes”  in  traffic  networks  [9,  33,  51,  63,  82,  92, 
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93,  97,  172,  175,  191]  and  for  analogues  of  Braess’s  paradox  in  queueing  networks  [31, 
37,  38,  188]  as  well  as  in  seemingly  unrelated  contexts  (such  as  the  strings  and  springs 
example  of  Subsection  1.3.1)  [16,  30,  36,  96];  renewed  interest  in  the  older  “Downs- 
Thomson  paradox”  [29,  55,  180];  and  even  stirred  debate  over  its  implications  for 
classical  philosophical  problems  [89,  120]. 

Network  Design 

Motivated  by  the  discovery  of  Braess’s  Paradox  and  evidence  of  similarly  counter¬ 
intuitive  and  counterproductive  traffic  behavior  following  the  construction  of  new 
roads  in  congested  cities  [64,  100,  101,  128],  researchers  have  tried  to  develop  network 
design  strategies  that  avoid  Braess’s  Paradox.  However,  scant  progress  has  been 
made  on  the  network  design  problem  that  we  study,  either  computationally  or  the¬ 
oretically.  Indeed,  early  computational  work  on  the  problem  either  focused  on  very 
small  networks  [114]  or  admitted  to  ignoring  congestion  effects  entirely,  due  to  the 
difficulties  involved  [27,  52,  86,  154,  168,  189];  in  a  1984  survey,  Magnanti  and  Wong 
describe  the  problem  as  “essentially  unsolved”  from  a  practical  perspective  [119, 
P.15].  On  the  theoretical  side,  all  existing  work  on  the  network  design  problem  that 
we  study  and  on  the  problem  of  detecting  the  presence  of  harmful  extraneous  edges 
either  exclusively  considers  the  four-node  networks  of  Figure  1.2  [71,  143,  144]  or 
other  very  special  classes  of  networks  [72,  125],  or  focuses  entirely  on  the  special 
case  where  only  one  edge  is  to  be  added  or  deleted  from  the  network  (as  opposed 
to  seeking  the  best  subgraph  of  a  network,  which  may  contain  many  fewer  edges 
than  the  entire  network)  [48,  125,  176,  178].  Prior  to  our  work  the  network  design 
problem  studied  in  this  chapter  was  not  known  to  be  NP-hard,  nor  was  any  heuristic 
for  the  problem  known  to  have  a  finite  approximation  ratio. 

Indirectly  related  to  our  work  is  a  series  of  papers  [7,  60,  105,  106,  116]  that 
study  traffic  models  that  differ  from  ours  in  that  latency  functions  are  assumed  to 
be  capacitated  (often  with  the  M/M/1  delay  functions  described  in  Section  3.5) 
and  network  users  are  assumed  to  control  a  strictly  positive  (rather  than  negligible) 
fraction  of  the  overall  traffic.  These  works  consider  the  problem  of  allocating  a 
fixed  amount  of  additional  capacity  to  network  edges  to  obtain  the  largest  possible 
improvement  in  the  Nash  equilibrium.  Since  these  papers  either  confine  themselves 
to  networks  of  parallel  links  or  provide  only  sufficient  (and  far  from  necessary) 
conditions  for  a  given  capacity  allocation  to  improve  network  performance,  they  are 
not  directly  relevant  for  our  network  design  problem. 
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Finally,  we  point  out  that  the  network  design  problem  considered  here  is  fun¬ 
damentally  different  from  most  of  the  network  design  problems  studied  in  the 
theoretical  computer  science  literature  (such  as  those  described  by  Goemans  and 
Williamson  [79]),  which  typically  ask  for  the  cheapest  network  satisfying  certain 
desiderata  such  as  high  connectivity  or  small  diameter.  Problems  of  this  sort  are 
only  nontrivial  in  the  presence  of  costs  on  vertices  and/or  edges;  otherwise,  the  best 
solution  is  to  simply  build  the  largest  possible  network.  On  the  other  hand,  Braess’s 
Paradox  shows  this  approach  to  be  suboptimal  for  our  network  design  problem;  even 
in  the  absence  of  costs,  it  is  not  at  all  clear  which  network  should  be  preferred. 

5.1.3  Organization 

We  begin  in  Section  5.2  by  quickly  formalizing  how  we  encode  network  design  in¬ 
stances  in  a  machine-readable  way.  In  each  of  the  next  three  sections,  we  prove 
matching  upper  and  lower  bounds  on  the  approximability  of  network  design  for  a 
different  class  of  allowable  edge  latency  functions.  Linear  latency  functions  are  con¬ 
sidered  in  Section  5.3,  general  (continuous  and  nondecreasing)  latency  functions  in 
Section  5.4,  and  polynomial  latency  functions  in  Section  5.5. 

5.2  Encodings  of  Latency  Functions 

We  have  thus  far  studied  analytic  problems  (bounding  the  inefficiency  of  Nash  flows) 
rather  than  algorithmic  ones;  for  this  reason,  we  have  not  yet  needed  to  describe 
how  an  instance  (G,  r,  £)  should  be  encoded  as  input  to  a  (mathematical  model  of  a) 
computer.  Before  we  present  complexity  results  for  the  problem  of  network  design, 
we  must  be  precise  about  our  encodings  of  network  latency  functions  (encoding  the 
network  G  and  a  rational  rate  vector  r  can  be  done  via  any  standard  method — see 
Aho  et  al.  [2],  for  example).  In  this  chapter,  we  assume  that  every  edge  latency 
function  is  either  a  polynomial  with  rational  coefficients  (in  which  case  the  input 
complexity  is  the  number  of  bits  needed  to  represent  the  values  and  positions  of  the 
coefficients)  or  a  piecewise  linear  function  described  by  a  finite  number  of  rational 
slopes  and  breakpoints  (in  which  case  the  input  complexity  is  the  number  of  bits 
needed  to  describe  the  slopes  and  breakpoints).  Less  restrictive  assumptions  are  of 
course  possible,  but  they  render  hardness  of  approximation  results  less  compelling. 
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5.3  Linear  Latency  Functions: 

An  Approximability  Threshold  of  | 

We  begin  with  the  setting  in  which  the  latency  of  every  edge  of  the  network  is  a 
linear  function  of  the  congestion  (that  is,  each  latency  function  £e  may  be  written 
£e(x)  =  aex  +  be  for  ae,be  >  0),  as  our  proof  of  the  inapproximability  of  network 
design  is  particularly  simple  in  this  special  case. 

We  next  formalize  our  network  design  problem.  Throughout  this  chapter,  we 
assume  all  instances  to  be  single-commodity.  By  Propositions  2.2.2  and  2.5.1,  the 
following  definition  makes  sense:  for  an  instance  ( G ,  r,  £)  with  source  s  and  desti¬ 
nation  t  admitting  a  Nash  flow  /,  we  define  L{G,r,£)  to  be  the  common  latency 
(with  respect  to  /)  of  every  s-t  flow  path  of  /.  If  G  has  no  s-t  path,  we  define 
L{G,r,£)  =  Too.  When  no  confusion  results,  we  will  abbreviate  the  expression 
L(G,r,£ )  by  L(G).  We  may  then  formally  state  our  network  design  problem  as 
follows: 

Given  an  instance  (G,r,  £),  find  the  subgraph  H  of  G  minimizing  L(H,r,£). 

Recall  that  the  trivial  algorithm,  when  presented  with  instance  (G,  r,  £),  outputs 
the  network  G  (i.e.,  always  decides  to  build  the  entire  network).  That  the  trivial 
algorithm  is  a  ^approximation  algorithm  for  LINEAR  LATENCY  NETWORK  DESIGN 
is  an  easy  corollary  (essentially  identical  to  Corollary  3.2.7)  of  Theorem  3.2.6,  which 
states  that  in  a  network  with  linear  latency  functions  a  Nash  flow  has  cost  at  most 
|  times  that  of  any  other  feasible  flow. 

Corollary  5.3.1  The  trivial  algorithm  is  a  |- approximation  algorithm  for  Linear 
Latency  Network  Design. 

Proof.  Consider  any  instance  ( G ,  r,  £)  with  linear  latency  functions,  with  subgraph 
H  minimizing  L(H,  r,  £).  Let  /  and  f*  denote  flows  at  Nash  equilibrium  for  (G,  r,  £) 
and  ( H,r,£ ),  respectively.  By  Proposition  2.2.4,  we  may  write  C(f)  =  r  ■  L{G,r,£ ) 
and  C(f  *)  =  r-L(H,  r,  £).  Since  f*  is  also  feasible  for  (G,  r,  £),  Theorem  3.2.6  implies 
that  C{f)  <  |  C(f*)  and  hence  L{G,r,£ )  <  ^L(H,r,  £).  m 

The  main  result  of  this  section  is  that,  unless  P  =  NP,  no  better  approximation 
is  possible  in  polynomial  time. 

Theorem  5.3.2  For  any  e  >  0,  there  is  no  (|  —  e) -approximation  algorithm  for 
Linear  Latency  Network  Design  unless  P  =  NP. 
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Figure  5.1:  Proof  of  Theorem  5.3.2.  In  a  “no”  instance  of  2DDP,  existence  of  sj-ii 
and  S2-t2  paths  implies  the  existence  of  an  S2-ti  path. 


Proof.  We  will  make  use  of  the  problem  2  Directed  Disjoint  Paths  (2DDP): 
given  a  directed  graph  G  =  (V,  E)  and  distinct  vertices  Si,s2,tiG2  G  V,  are  there 
Si-ti  paths  Pi  for  i  —  1,2,  such  that  P\  and  P2  are  vertex-disjoint?  This  problem  was 
proved  NP-complete  by  Fortune  et  al.  [70].  We  will  show  that  a  (|— e)-approximation 
algorithm  for  LINEAR  LATENCY  NETWORK  DESIGN  can  be  used  to  distinguish 
“yes”  and  “no”  instances  of  2DDP  in  polynomial  time. 

Consider  an  instance  X  of  2DDP,  as  above.  Augment  the  vertex  set  V  by  an 
additional  source  s  and  sink  t,  and  include  directed  edges  (s,  si),  (s,  S2),  (f2,  t) 

(see  Figure  5.1).  Denote  the  new  network  by  G'  =  (V7,  E ')  and  endow  the  edges  of  E' 
with  linear  latency  functions  t  as  follows:  all  edges  of  E  are  given  the  latency  function 
£(x)  =  0,  edges  (s,  s2)  and  (ti,t)  are  given  the  latency  function  £{x)  =  x,  and  edges 
(s,  si)  and  (f2,f)  are  given  the  latency  function  £{x)  =  1.  This  construction  can 
clearly  be  done  in  polynomial  time. 

To  complete  the  proof,  it  suffices  to  show  the  following  two  statements:  (i)  if  X  is 
a  “yes”  instance  of  2DDP,  then  there  is  a  subgraph  E[  of  G'  satisfying  X(P, 1,  i)  =  |; 
(ii)  if  X  is  a  “no”  instance,  then  for  any  subgraph  PI  of  G',  L(H,  1,  £)  >  2. 

To  prove  (i),  let  Pi  and  P2  be  vertex-disjoint  Si-ti  and  S2-t2  paths  in  G,  respec¬ 
tively,  and  obtain  E[  by  deleting  all  edges  of  G  not  contained  in  some  Pr.  Then, 
H  is  a  subgraph  of  G1  with  exactly  two  s-t  paths,  and  routing  half  a  unit  of  flow 
along  each  yields  a  flow  at  Nash  equilibrium  in  which  each  path  has  latency  |  (cf., 
Figure  1.2(a)). 

For  (ii),  we  may  assume  that  H  contains  an  s-t  path.  If  P[  has  an  s-t  path  P 
containing  an  s2-fi  path,  then  define  a  flow  /  by  routing  a  single  unit  of  flow  on  P; 
this  is  a  flow  at  Nash  equilibrium,  with  respect  to  which  every  s-t  path  has  latency 
2  (cf.,  Figure  1.2(b)),  so  L(P[)  =  2.  Otherwise,  since  X  is  a  “no”  instance,  there  are 


only  two  remaining  possibilities  (see  Figure  5.1):  either  for  precisely  one  i  G  {1,2}, 
H  has  an  s-t  path  P  containing  an  Si-ti  path,  or  all  s-t  paths  P  in  PI  contain  an 
Si-^2  path  of  G.  In  either  case,  routing  one  unit  of  flow  along  such  a  path  P  provides 
a  flow  at  Nash  equilibrium  showing  that  L(H)  =  2.  ■ 

Corollary  5.3.1  and  Theorem  5.3.2  imply  that  efficiently  detecting  whether  or  not 
network  performance  is  hampered  by  harmful  extraneous  edges  in  networks  with  lin¬ 
ear  latency  functions  is  impossible,  even  in  instances  suffering  from  the  most  severe 
manifestations  of  this  paradox.  To  make  this  statement  precise,  call  an  instance 
(G,r,£)  with  linear  latency  functions  paradox-free  if  L(H,r,£)  >  L(G,r,£ )  for  all 
subgraphs  H  of  G  (i.e. ,  if  the  entire  network  is  an  optimal  subnetwork)  and  paradox- 
ridden  if  for  some  subgraph  H  in  G ,  L(H,r,£)  =  | L(G,r,£).  By  Corollary  5.3.1, 
paradox-ridden  instances  are  precisely  those  incurring  a  worst-possible  loss  in  net¬ 
work  performance  due  to  detrimental  extra  edges.  The  construction  in  the  proof  of 
Theorem  5.3.2  then  gives  the  following  corollary. 

Corollary  5.3.3  Given  an  instance  (■ G,r,£ )  with  linear  latency  functions  that  is 
either  paradox-free  or  paradox-ridden,  it  is  NP-hard  to  decide  whether  or  not  ( G ,  r,  £') 
is  paradox-ridden. 

5.4  General  Latency  Functions: 

An  Approximability  Threshold  of  Ln/2j 

In  this  section  we  consider  the  problem  of  network  design  with  the  broadest  possible 
class  of  latency  functions  (assuming  we  insist  on  the  existence  and  uniqueness  of  flows 
at  Nash  equilibrium),  the  set  of  all  continuous  nondecreasing  functions.  We  begin 
by  proving  in  Subsection  5.4.1  that  the  trivial  algorithm  achieves  an  approximation 
ratio  of  |_n/2_|,  where  n  is  the  number  of  vertices  in  the  network  (in  contrast  to 
other  sections,  this  performance  guarantee  does  not  trivially  follow  from  our  work 
bounding  the  price  of  anarchy).  In  Subsection  5.4.2,  we  introduce  a  new  family  of 
graphs  generalizing  the  network  of  the  original  Braess’s  Paradox  (Figure  1.2(b)); 
this  family  may  be  of  independent  interest,  as  (to  the  best  of  our  knowledge)  these 
networks  give  the  first  demonstration  that  the  severity  of  Braess’s  Paradox  can 
increase  with  the  network  size.  We  conclude  in  Subsection  5.4.3  by  using  this  family 
to  prove  an  optimal  hardness  result  matching  the  upper  bound  provided  by  the 
trivial  algorithm. 
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5.4.1  An  |_?7- / 2j  -Approximation  Algorithm 

Our  goal  in  this  subsection  is  to  prove  that  the  trivial  algorithm  is  an  \n/2\- 
approximation  algorithm  for  GENERAL  LATENCY  NETWORK  DESIGN,  where  n  is 
the  number  of  vertices  in  the  network.  Before  embarking  on  the  proof,  it  is  impor¬ 
tant  to  contrast  the  settings  of  general  and  linear  latency  functions.  In  particular, 
we  saw  in  the  proof  of  Corollary  5.3.1  that  a  known  result  upper  bounding  the  to¬ 
tal  latency  of  a  Nash  flow  relative  to  any  other  feasible  flow  immediately  yielded 
an  identical  upper  bound  on  the  performance  of  the  trivial  algorithm.  Thus,  if  we 
knew  that  a  Nash  flow  in  a  network  with  n  vertices  and  general  latency  functions 
was  at  most  g(n)  times  as  costly  (with  respect  to  the  total  latency  measure)  as  any 
other  feasible  flow  for  some  “nice”  (e.g.,  linear)  function  g(-),  we  would  be  done.1 
Unfortunately,  the  nonlinear  variant  of  Pigou’s  example  (Subsection  2.4.4)  shows 
that  no  such  result  can  hold:  with  general  latency  functions,  a  Nash  flow  may  be 
arbitrarily  more  costly  than  other  feasible  flows,  even  in  networks  with  only  two 
vertices  and  two  edges. 

However,  this  fact  is  not  due  cause  for  abandoning  the  goal  of  proving  some 
kind  of  performance  guarantee  for  the  trivial  algorithm;  it  merely  indicates  that  a 
more  delicate  approach  is  required.  In  the  example  of  Subsection  2.4.4,  the  flow 
with  near-zero  cost  was  far  from  at  equilibrium:  a  few  martyrs  were  routed  on 
the  upper  edge  (the  edge  with  constant  latency  function  £(x)  =  1)  for  the  benefit 
of  the  overwhelming  majority  of  the  flow  (on  the  lower  edge).  Indeed,  all  (non¬ 
empty)  subgraphs  H  of  G  satisfy  L(H )  =  1.  Thus,  while  any  subgraph  provides 
an  optimal  solution  to  our  network  design  problem,  we  have  no  way  of  proving  any 
finite  approximation  ratio! 

By  comparing  the  output  of  the  trivial  algorithm  only  to  feasible  flows  at  equi¬ 
librium  in  a  subgraph  of  G  (rather  than  to  all  feasible  flows),  we  obtain  the  main 
result  of  this  subsection. 

Theorem  5.4.1  For  any  instance  (G,r,£)  with  |U(G)|  =  n,  the  trivial  algorithm 
returns  a  solution  of  value  at  most  |_|J  times  that  of  the  optimal  solution. 

Proof.  Let  /  and  f*  be  flows  at  Nash  equilibrium  for  (G,r,£)  and  ( H,r,£ ),  re¬ 
spectively,  with  H  a  subgraph  of  G  containing  an  s-t  path.  By  Propositions  2.5.1 
and  2.6.2,  we  may  assume  that  /  is  acyclic.  Put  L  =  L(G,r,£)  and  L*  =  L(H,  r,£ ); 
we  wish  to  prove  that  L  <  |_n/2j  •  L* . 

Undeed,  this  argument  will  reoccur  in  Section  5.5. 
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1(f)  <L*  1(f)  < 


d=0  d  <  L  d  <  2L 

Figure  5.2:  Proof  of  Theorem  5.4.1.  If  /  is  the  flow  sending  one  unit  of  flow  on  the 
four-hop  path  and  /*  is  the  flow  sending  half  a  unit  of  flow  on  each  of  the  other  two 
paths,  then  the  dashed  edges  are  light. 


The  rest  of  the  proof  will  make  crucial  use  of  Proposition  2.6.1.  Accordingly, 
define  d(v)  for  v  G  V(G)  as  in  Proposition  2.6.1,  as  the  length  (with  respect  to 
edge  lengths  £e(fe))  of  a  shortest  s-v  path.  Assume  for  simplicity  that  n  is  odd 
and  that  every  vertex  of  G  is  incident  to  an  edge  e  with  fe  >  0;  extending  the 
following  argument  to  the  general  case  is  straightforward.  Order  the  vertices  s  = 
v0,vi, . . .  ,un_i  =  t  according  to  nondecreasing  d(u)- value.  If  there  is  an  edge  e  = 
(v,w)  with  fe  >  0  and  £e{fe)  =  0  (so,  by  Proposition  2.6.1,  d(v)  =  d(w)),  break 
the  tie  by  placing  v  before  w  in  the  ordering;  this  will  always  be  possible  since  /  is 
acyclic.  Proposition  2.6.1  implies  that  this  ordering  is  a  topological  one  with  respect 
to  the  flow  / — that  is,  whenever  fe>  0,  e  is  a  forward  edge  with  respect  to  onr 
ordering.  Onr  proof  approach  will  be  to  show,  by  induction  on  i,  that  d(v2i)  <  i  ■  L* ; 
the  base  case  i  =  0  is  trivial. 

Before  considering  the  inductive  step,  we  require  a  definition  and  a  claim.  Call 
an  edge  e  light  if  fe  <  f*  and  / *  >  0  (in  particular,  e  must  be  present  in  H ) .  Light 
edges  are  useful  to  us  because  they  have  latency  at  most  L*  with  respect  to  f*  (as 
every  flow  path  of  f*  has  latency  L*)  and  hence  latency  at  most  L*  with  respect 
to  /  (since  latencies  are  nondecreasing);  thus,  vertices  of  G  that  are  adjacent  via  a 
light  edge  differ  in  d- values  by  at  most  L* .  The  next  claim  assures  us  of  a  healthy 
supply  of  light  edges:  every  s-t  cut  consisting  of  a  set  of  consecutive  vertices  (with 
respect  to  our  topological  ordering)  contains  a  light  edge  (see  Figure  5.2). 

Claim:  Let  S  =  {no,  •  •  •  ,Vk}  for  some  k  G  {0, 1, . . ,  ,  n  —  2}.  Then  some  light  edge 

has  its  tail  in  S  and  head  outside  of  S. 

Proof.  Let  h+(S')  denote  the  edges  with  tail  inside  S  and  head  outside  S,  and  5~  (S) 
the  edges  with  head  inside  S  and  tail  outside  S.  Since  S  is  an  s-t  cut  and  /  is  an 
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s-t  flow  of  value  r  with  no  flow  on  edges  in  5  ( S )  (as  the  vertices  are  topologically 
sorted  according  to  /),  J2ees+(S)  fe  —  r-  Since  S  is  an  s-t  cut  and  /*  is  an  s-t  flow, 
Eee<s+(S)  fe  >  r.  Hence,  fe  <  f*  for  some  e  G  8+  (S)  with  f  *  >  0.  ■ 

Now  suppose  i  e  {1, . . . ,  (n— 1)/2}  and  d(u2p-i))  <  (i  —  l)L*.  Let  k  be  the  largest 
integer  such  that  there  is  a  path  of  light  edges  from  vj  to  rq  for  some  j  <  2{i  —  1);  we 
will  show  that  k  >2i.  The  previous  claim  immediately  implies  that  k  is  well  defined 
with  k  >  2(i  —  1)  (consider  the  head  of  a  light  edge  in  h+({u0, . . . ,  1^2(1— i) })) -  To  see 
that  k  >  2 i,  observe  that  if  k  =  2i—l  then  all  light  edges  in  <5+({u0, . . . ,  n2(j_i)})  (and 
there  must  be  one)  have  head  v2i- 1  and  no  light  edge  in  <5+({u0, . . . ,  i>2®— 1})  has  tail 
V2i~i  (otherwise  we  would  append  such  an  edge  to  our  maximal  path),  contradicting 
that  <5+({uo, . . . ,  i>2i-i})  must  contain  a  light  edge. 

We  have  established  the  existence  of  a  path  P  of  light  edges  from  Vj  to  Vk  with 
j  <  2(i  —  1)  and  k  >  2 i.  Inductively,  we  have  d{v3)  <  d(n2(*-i))  <  (i  —  1)L*; 
since  d(v2i)  <  d(vk),  we  can  finish  the  inductive  step  and  the  proof  by  showing  that 
d(vk)  —  d(vj )  <  L*  (informally,  d(i>2(i-i))  and  d{v2i)  are  sandwiched  between  d{vf) 
and  d{i’k),  so  it  suffices  to  upper  bound  the  gap  between  the  latter  pair  of  numbers). 
Letting  d*(v)  denote  the  length  of  a  shortest  s-v  path  in  H  with  respect  to  edge 
lengths  £e{fe),  we  can  apply  Proposition  2.6.1  to  /*  in  H  to  obtain  0  =  d*(s)  < 
d*(vj)  <  d*(vk)  <  d*(t)  =  L*.  By  Proposition  2.6.1,  this  implies  that  the  latency  of 
P  with  respect  /*  is  at  most  L*\  since  all  edges  of  P  are  light,  it  follows  that  the 
latency  of  P  with  respect  to  /  is  at  most  L*.  A  final  application  of  Proposition  2.6.1 
then  yields  d(vk)  —  d(vj )  <  L* ,  completing  the  inductive  step  and  the  proof.  ■ 

5.4.2  The  Braess  Graphs 

We  seek  to  prove  a  lower  bound  on  the  approximability  of  network  design  (and  in 
particular,  on  the  performance  of  the  trivial  algorithm)  that  is  linear  in  the  number 
of  vertices  of  the  network.  Toward  this  end,  we  will  construct  an  infinite  family 
of  networks  on  which  the  trivial  algorithm  performs  poorly  (networks  in  which  the 
value  of  a  flow  at  Nash  equilibrium  can  be  vastly  improved  by  removing  some  edges); 
we  will  prove  hardness  results  in  the  next  subsection  via  similar  but  more  involved 
arguments. 

We  define  the  kth  Braess  graph  Bk  as  follows:  start  with  a  set  Vk  = 
{s,  Vi, . . . ,  Vk,  wi, . . . ,  Wk,  t}  of  2£;+2  vertices  and  define  Ek  by  {(s,  ig),  (vj,  wf),  ( Wi ,  t) : 
1  <  i  <  k}U  {( V{ ,  Wi- 1)  :  2  <  i  <  k}  U  {(iq,  t)}  U{(s,  Wk)}  (see  Figure  5.3).  We  note 
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(a)  B 2  (b)  B3 

Figure  5.3:  The  second  and  third  Braess  graphs 


that  B 1  is  the  graph  in  which  Braess’s  Paradox  was  first  discovered  (Figure  1.2(b)). 

We  next  define  latency  functions  lk  for  the  edges  of  Bk ;  these  functions  will 
prove  useful  in  Proposition  5.4.2  below.  For  each  edge  of  the  form  e  =  ( ),  put 
£k(x)  =  0;  for  an  edge  e  of  the  form  (uj,Wj_i),  ( s,Wk ),  or  put  £k(x)  =  1; 

for  i  e  {1,2,...,  A:}  and  an  edge  e  of  the  form  (wy,  t)  or  (s,  i>fc_i+1),  put  £k(x)  equal 
to  any  nonnegative,  continuous,  and  nondecreasing  function  satisfying  £k(-jyh)  =  0 
and  £g(l)  =  i  (thus,  £k  may  be  chosen  to  be  convex  and  infinitely  differentiable,  if 
desired) . 

We  can  now  show  how  to  use  the  Braess  graphs  to  construct  instances  on  which 
the  trivial  algorithm  for  GENERAL  LATENCY  NETWORK  DESIGN  performs  badly. 

Proposition  5.4.2  For  any  integer  n  >  2,  there  is  an  instance  (G,r,£)  with 
|P(Cr)|  =  n  for  which  the  trivial  algorithm  produces  a  solution  with  value  at  least 
LfJ  times  that  of  the  optimal  solution. 

Proof.  We  may  suppose  that  n  is  even  and  at  least  four  (for  n  odd,  take  a  bad 
example  for  n  —  1  and  add  an  isolated  vertex).  Write  n  —  2k  +  2  for  k  E  AT 
and  consider  the  instance  (Bk,k,£k).  For  i  =  1  ,...,k,  let  Pi  denote  the  path 
s  — >  tg  — >  Wi  — >  t.  For  i  —  2, . . . ,  k,  let  Qt  denote  the  path  s  — >  ig  — >  i  — >  t\  define 
Qi  to  be  the  path  s  — >  v\  — >  t  and  Qk+i  the  path  s  — >  — >  t.  On  one  hand,  routing 
one  unit  of  flow  on  each  of  Pi, . . . ,  Pk  yields  a  flow  at  Nash  equilibrium  for  ( Bk ,  k,  £k ) 
demonstrating  that  L(Bk,  k,£k)  —  k  +  1  (see  Figure  5.4(a)  for  an  illustration  when 
k  =  3).  On  the  other  hand,  if  FI  is  the  subgraph  obtained  from  Bk  by  deleting 
all  edges  of  the  form  (y^wf),  routing  -A-  units  of  flow  on  each  of  Qi,  ■  ■  ■  ,Qk+i 
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(b)  Nash  flow  in  the  optimal  sub¬ 
graph 


Figure  5.4:  Proof  of  Proposition  5.4.2,  when  k  —  3.  Solid  edges  carry  flow  in  the 
flow  at  Nash  equilibrium,  dashed  edges  do  not.  Edge  latencies  are  with  respect  to 
flows  at  Nash  equilibrium. 


yields  a  flow  at  Nash  equilibrium  for  (H,k,£k)  showing  that  L(H,k,£k)  =  1  (see 
Figure  5.4(b)).  Thus,  L(G) / L(H)  =  k  +  1  =  n/2,  completing  the  proof.  ■ 

5.4.3  Proof  of  Hardness 

We  begin  with  an  informal  description  of  the  reduction.  Recall  that  in  an  instance 
of  the  NP-hard  problem  PARTITION,  we  are  given  q  positive  integers  {ai,  a2, . . . ,  aq} 
and  seek  a  subset  S  C  {1,  2, . . . ,  q}  such  that  J2jes  aj  =  \  Xq= 1  aj  [77,  SP12],  The 
idea  of  the  reduction  is  to  start  with  a  Braess  graph  and  replace  the  edges  of  the  form 
(vi,  Wi)  with  a  collection  of  parallel  edges  representing  an  instance  X  =  {oq, . . . ,  aq} 
of  Partition.  We  will  endow  these  edges  with  latency  functions  that  simulate 
“capacities”,  with  an  edge  representing  an  integer  aq  of  X  receiving  capacity  aj. 
Roughly  speaking,  if  too  many  edges  are  removed  from  the  network,  there  will  be 
insufficient  remaining  capacity  to  send  flow  cheaply;  if  too  few  edges  are  removed, 
the  excess  of  capacity  results  in  a  Nash  flow  similar  to  that  of  Figure  5.4(a);  and  if  X 
is  a  “yes”  instance  of  Partition  and  an  appropriate  collection  of  edges  is  removed, 
then  the  remaining  network  admits  a  Nash  flow  similar  to  that  of  Figure  5.4(b). 
(We  can  also  obtain  a  nearly  optimal  inapproximability  result  using  a  strongly  NP- 
complete  problem;  see  Remark  5.4.6  below.) 


94 


Theorem  5.4.3  For  e  >  0,  there  is  no  (|_n/2j  —  e) -approximation  algorithm  for 
General  Latency  Network  Design  unless  P  =  NP. 

Proof.  We  prove  that  for  any  fixed  n  >  2,  there  is  no  (|_|J  —  e)-approximation 
algorithm  for  General  Latency  Network  Design  restricted  to  (multi)graphs 
with  n  vertices.  (We  can  also  restrict  onr  instances  to  be  simple  networks  and 
derive  a  nearly  optimal  inapproximability  result — see  Remark  5.4.5  below.)  As  in 
the  proof  of  Proposition  5.4.2,  we  may  assume  that  n  is  even  and  at  least  four.  Write 
n  —  2k  +  2  for  k  G  A f.  We  will  show  that  an  —  e)-approximation  algorithm  for 
graphs  with  n  vertices  enables  us  to  differentiate  between  “yes”  and  “no”  instances 
of  Partition  in  polynomial  time. 

Consider  an  instance  X  =  {a.,}q=1  of  Partition,  with  each  aj  a  positive  integer. 
We  may  assume  that  each  aj  is  even,  scaling  if  necessary.  Put  A  =  Y?j=iaj\  the 
traffic  rate  of  interest  to  us  is  r  =  k^  +  k  +  l.  Obtain  a  graph  G  from  the  kth  Braess 
graph  Bk  by  replacing  each  edge  of  the  form  ( Vi,Wi )  by  q  parallel  edges,  and  denote 
these  by  ej,  ef, . . . ,  e\. 

We  now  specify  the  edge  latency  functions  £,  which  are  more  complicated  than 
in  the  previous  subsection.  We  require  a  sufficiently  small  constant  5  (1  /A(q  +  k ) 
is  small  enough)  and  a  sufficiently  large  constant  M  (n/2  is  large  enough).  In  what 
follows,  the  constant  M  should  be  interpreted  as  a  substitute  for  +oo,  and  is  used 
to  penalize  a  flow  for  violating  an  edge  capacity  constraint.  We  require  the  constant 
5  to  transform  step  functions  (the  type  of  function  that  would  be  most  convenient 
for  our  argument)  into  continuous  functions  (which  are  allowable  in  our  model);  6 
provides  a  small  “window”  in  which  to  “smooth  out”  the  discontinuities  of  a  step 
function.  For  each  edge  e  of  the  form  (iq,  Wi_ i),  (s,  W]f),  or  (iq,  t),  define  £e(x )  =  1  for 
x  <  1  and  £e{x)  =  M  for  x  >  1+5  (£e  may  be  defined  arbitrarily  on  (1, 1+5),  subject 
to  the  usual  continuity  and  monotonicity  restrictions).  We  say  that  these  edges  have 
capacity  1.  For  an  edge  e  of  the  form  ( Wi,t )  or  (s,Ufc_j+i)  (where  i  G  {1, . . . ,  k}), 
define  £e(x)  —  0  for  x  <  \A  +  1,  £e(x)  =  i  when  x  =  \A  +  ^±4,  and  £e{x)  =  M  for 
x  >  \A  +  ^44  q-  these  edges  have  capacity  \A  +  4±1.  Finally,  for  an  edge  e  of  the 
form  e?,  define  £e(x)  =  0  for  x  <  aj  —  5,  £e((ij)  =  1,  and  £e(x )  =  M  for  x  >  aj  +  5; 
thus  ej  has  capacity  aj.  Each  latency  function  can  be  described  by  a  piecewise  linear 
function  with  a  small  (constant)  number  of  rational  breakpoints  and  slopes,  and  the 
instance  ( G ,  r,  £)  can  be  constructed  from  X  in  polynomial  time. 

Analogous  to  the  proof  of  Theorem  5.3.2,  it  suffices  to  prove  the  following  two 
statements:  (i)  if  X  is  a  “yes”  instance,  then  G  admits  a  subgraph  H  with  L(p[,  r,  £)  = 
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1;  and  (ii)  if  X  is  a  “no”  instance,  then  L(H,  r,  £)  >  n / 2  for  every  subgraph  H  of  G. 

To  prove  (i),  suppose  that  X  admits  a  partition,  and  reindex  the  a/s  so  that 
a,j  =  A/2  for  some  me  {1,  2, . . . ,  q  —  1}.  Obtain  H  from  G  by  deleting  all 
edges  of  the  form  e{  for  j  >  m ;  thus,  for  each  i  =  1, . . . ,  k,  the  remaining  edges  of 
the  form  ej  have  total  capacity  A/2.  Define  the  paths  Q i, . . . ,  Qk+i  as  in  the  proof 
of  Proposition  5.4.2:  for  i  =  2 Qi  denotes  the  path  s  — ■>  Vi  — >  Wi-i  — >  t, 
Q i  is  the  path  s  — >  iq  — >  t,  and  Qk+i  is  the  path  s  — >  Wk  — >  f.  Define  a  feasible 
flow  /  as  follows:  for  each  i  =  1, . . . ,  k  and  j  =  1, . . . ,  m,  route  aj  units  of  flow 
on  the  unique  path  containing  edge  ej,  and  route  1  unit  of  flow  on  the  path  Qi  for 
i  =  1,  2, . . . ,  k  +  1.  The  flow  /  is  at  Nash  equilibrium  for  ( H ,  r,  £)  and  proves  that 
L(H,r,£)  =  1  (see  Figure  5.5(a)). 

In  proving  (ii),  we  first  consider  only  subgraphs  H  that  contains  all  edges  not  of 
the  form  e\  (i.e. ,  H  may  be  obtained  from  G  by  deleting  only  some  of  the  parallel 
edges);  as  we  will  see,  this  case  captures  all  of  the  difficulties  of  the  proof.  There 
are  two  subcases  to  consider. 

Case  1:  Suppose  for  each  i  —  1, . . . ,  k,  the  total  capacity  At  of  edges  of  the  form  e\ 
in  H  is  at  least  A/2.  Since  X  is  a  “no”  instance  and  each  aj  is  even,  Ai  >  A/ 2  +  2  for 
each  i.  Then,  define  a  flow  /  in  G  as  follows:  for  each  i  =  1, . . . ,  k  and  j  =  1, ...  ,q 
such  that  e\  is  present  in  H,  route  +  ^/r~)  units  of  flow  along  the  unique  s-t 
path  containing  e\.  The  flow  /  is  at  Nash  equilibrium  and  proves  that  L(H)  —  n/2 
(see  Figure  5.5(b)). 

Case  2:  Suppose  for  some  i  e  {1, . . . ,  k},  the  total  capacity  Ai  of  edges  of  the 
form  e\  in  H  is  less  than  Aj 2  (and  thus  is  at  most  A/2  —  2).  Here,  we  will  exploit 
the  fact  that  all  edges  of  the  network  are  (essentially)  capacitated  to  prove  that  a 
flow  at  Nash  equilibrium  must  have  large  cost.  Call  an  edge  e  oversaturated  by  a 
flow  /  if  fe  exceeds  the  capacity  of  e  by  at  least  5  (and  thus  £e{fe )  —  M  >  n/2). 
A  key  observation  is  that  if  /  is  at  Nash  equilibrium  for  ( H ,  r,  £)  and  oversaturates 
some  edge,  then  L(H,r,£)  >  n/2.  Now,  since  the  total  capacity  of  edges  out  of  vt 
is  at  most  A/2  —  1  (recall  (vi,Wi_ i)  has  capacity  1),  any  flow  that  places  at  least 
4  —  1  +  q5  units  of  flow  on  (s,  v/)  will  oversaturate  some  edge  out  of  vt.  On  the  other 
hand,  the  total  capacity  of  edges  incident  to  s  is  k^  +  k  +  2  =  r  +  1,  so  any  feasible 
flow  must  either  place  at  least  4  —  1  +  qd  units  of  flow  on  (s,  v/)  or  oversaturate  some 
other  edge  out  of  s  (for  5  sufficiently  small).  We  conclude  that  any  flow  feasible  for 
(. H,r,£ )  oversaturates  at  least  one  edge,  and  hence  L(H)  >  n/2. 
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(a)  A  good  Nash  flow  correspond¬ 
ing  to  a  “yes”  instance  of  Parti¬ 
tion,  with  m  =  2 


(b)  A  bad  Nash  flow  in  a  network 
with  excess  capacity 


Figure  5.5:  Proof  of  Theorem  5.4.3.  Solid  edges  carry  flow  in  the  flow  at  Nash 
equilibrium,  dashed  edges  do  not.  Edge  latencies  are  with  respect  to  flows  at  Nash 
equilibrium. 


Finally,  suppose  H  fails  to  contain  an  edge  that  is  not  of  the  form  ej.  If  for 
some  i  G  {1,2,...,  k},  the  total  capacity  of  edges  of  the  form  ej  is  at  most  A/2, 
then  the  argument  of  Case  2  still  applies  to  show  that  L(H )  >  n/2 — the  previous 
argument  merely  required  that  any  feasible  flow  oversaturates  some  edge,  and  this 
fact  remains  valid  if  we  remove  further  edges.  Also,  if  H  fails  to  contain  an  edge 
of  the  form  (s,Vi)  or  ( Wi,t ),  then  simple  capacity  considerations  show  that  any 
feasible  flow  in  H  oversaturates  some  edge  incident  to  s  or  t ,  respectively.  If  H 
contains  all  edges  of  the  form  (s,Vi)  and  ( Wi,t )  and  the  total  capacity  of  edges  of 
the  form  ej  in  H  is  at  least  Aj 2  for  each  i,  then  the  argument  of  Case  1  applies  (by 
hypothesis,  all  edges  used  by  the  Nash  flow  in  that  case  are  present  in  H ),  showing 
that  L(H )  =  n/2.  This  exhausts  all  possible  cases,  and  the  proof  is  complete.  ■ 

The  matching  upper  and  lower  bounds  of  Theorems  5.4.1  and  5.4.3  have  strong 
negative  consequences  for  the  problem  of  detecting  harmful  extraneous  edges,  as  in 
the  linear  latency  function  setting  (see  Corollary  5.3.3).  Defining  an  instance  (G,  r ,  Ij 
with  general  latency  functions  and  n  vertices  to  be  paradox-free  if  L(H,r,£ )  > 
L(G,r,£)  for  all  subgraphs  H  of  G  and  paradox-ridden  if  for  some  subgraph  H  in 
G,  L(H,r,£ )  =  ( Ln/2J  )~1L(G,  r,  £),  we  obtain  the  following  corollary. 

Corollary  5.4.4  Given  an  instance  ( G,r,£ )  with  general  latency  functions  that  is 
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either  paradox-free  or  paradox-ridden,  it  is  NP-hard  to  decide  whether  or  not  (G,  r,  £) 
is  paradox-ridden. 

Remark  5.4.5  The  reduction  of  Theorem  5.4.3  also  shows  that,  for  any  constant 
e  >  0,  there  is  no  O (n 1  _e )  -approximat  ion  algorithm  for  General  Latency  Net¬ 
work  Design  restricted  to  simple  graphs  (unless  P  =  NP).  To  see  why,  choose  a 
positive  integer  k  satisfying  k  >  L  and  for  a  PARTITION  instance  X  with  q  items, 
mimic  the  previous  reduction  beginning  with  the  Braess  graph  Bq  on  2 qk  +  2  ver¬ 
tices.  Subdividing  all  parallel  edges  in  the  resulting  multigraph  yields  a  simple  graph 
G  (whose  size  is  polynomial  in  that  ofX)  with  n  =  qk+1  +  2qk  +  2  vertices.  Defining  r 
and  i  as  in  the  proof  of  Theorem  5.4.3,  G  has  a  subgraph  PI  satisfying  L(H,  r,  £)  —  1 
if  X  is  a  “yes”  instance  while  L(H,  r,  £)  >  qk  +  1  for  every  subgraph  H  if  X  is  a  “no” 
instance.  Thus,  no  0('nfk~ 1  ^^-approximation  algorithm  exists  for  General  La¬ 
tency  Network  Design  restricted  to  simple  graphs,  unless  P  =  NP. 

Remark  5.4.6  The  reduction  of  Theorem  5.4.3  makes  use  of  the  weakly  NP-hard 
problem  Partition  [77],  and  it  thus  reasonable  to  ask  for  approximation  algorithms 
with  good  performance  guarantee  but  pseudopolynomial  running  time  (running  time 
polynomial  in  the  network  size  and  in  the  unary  representation  of  the  numbers  used 
to  describe  the  latency  functions).  Unfortunately,  such  an  algorithm  cannot  exist 
(unless  P  =  NP):  the  non-existence  of  an  O (n 1  “^-approximation  algorithm  for 
network  design  on  simple  graphs  can  also  be  derived  from  the  more  complicated 
construction  in  the  proof  of  Theorem  5.5.6  below,  and  this  construction  relies  only 
on  the  strongly  NP-hard  problem  2DDP  of  Section  5.3. 

5.5  Polynomials  of  Bounded  Degree: 

An  Approximability  Threshold  of  0(j^) 

In  this  section,  we  aim  to  show  that  the  strong  hardness  results  of  Sections  5.3 
and  5.4  extend  beyond  the  particular  classes  of  linear  and  general  latency  functions, 
and  seem  intrinsic  to  the  problem  of  designing  networks  for  selfish  users.  We  will 
focus  on  networks  with  latency  functions  that  are  polynomials  of  bounded  degree, 
but  our  techniques  will  also  have  consequences  for  networks  possessing  other  types 
of  well-behaved  latency  functions. 

As  in  Section  5.3,  we  begin  by  observing  that  our  previous  work  bounding  the 
worst-case  inefficiency  of  flows  at  Nash  equilibrium  yields  an  upper  bound  on  the 
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performance  guarantee  of  the  trivial  algorithm.  Recall  from  Proposition  3.5.5  that 
in  an  instance  with  polynomial  latency  functions  of  degree  p  (with  all  polynomial 
coefficients  nonnegative),  the  total  latency  of  a  flow  at  Nash  equilibrium  is  at  most 
[1  —  p  ■  (p  +  1  )-(p+1)/p]-1  times  that  of  any  other  feasible  flow.  For  clarity,  we  will 
work  with  the  following  weaker  form  of  Proposition  3.5.5. 

Corollary  5.5.1  There  is  a  constant  c\  >  0  so  that  the  following  statement  holds: 
if  P  >  2  and  (G,  r,  £)  is  an  instance  with  polynomial  latency  functions  of  degree  p  for 
which  f*  is  feasible  and  f  is  a  flow  at  Nash  equilibrium,  then  C(f )  <  •  C(f*). 

As  with  linear  latency  functions  (see  Corollary  5.3.1),  we  immediately  obtain  an 
upper  bound  on  the  performance  guarantee  of  the  trivial  algorithm  for  our  network 
design  problem  restricted  to  networks  with  polynomial  latency  functions  of  degree 
p.  We  call  this  problem  Polynomial(p)  Latency  Network  Design. 

Corollary  5.5.2  There  is  a  constant  C\  >  0  so  that,  for  any  p  >  2,  the  trivial  algo¬ 
rithm  is  a  c\^- approximation  algorithm  for  Polynomial (p)  Latency  Network 
Design. 

We  next  work  toward  a  proof  of  a  matching  hardness  result.  As  in  Section  5.4, 
we  first  give  a  family  of  networks  (one  network  for  each  value  of  p  >  2)  on  which 
the  trivial  algorithm  performs  poorly,  and  then  describe  how  to  obtain  a  general 
inapproximability  result. 

Proposition  5.5.3  There  is  a  constant  c2  >  0  so  that,  for  anyp  >  2,  the  worst-case 
performance  guarantee  of  the  trivial  algorithm  is  at  least  c2^  for  Polynomial(p) 
Latency  Network  Design. 

Proof.  We  will  again  make  use  of  the  Braess  graphs  of  Subsection  5.4.2.  In  Sec¬ 
tion  5.4,  we  exploited  the  fact  that  general  latency  functions  can  be  arbitrarily  steep 
to  construct  a  bad  example  for  the  trivial  algorithm;  here,  we  adapt  the  previous 
argument  as  best  we  can,  given  that  only  low-degree  polynomials  are  available  to 
us. 

For  a  fixed  integer  p,  define  a  set  of  latency  functions  tk  for  the  edges  of  Bk 
as  follows  (where  k  is  a  parameter,  depending  on  p ,  to  be  chosen  later):  for  each 
edge  of  the  form  e  =  ( Vi,Wi ),  put  £k(x)  =  0;  for  an  edge  e  of  the  form  1), 

(s,Wk),  or  put  £k(x)  =  1;  for  an  edge  e  of  the  form  ( Wi,t )  or  (s,  Vk-i+i) 

put  £k(x)  =  ixp.  Next,  consider  the  instance  ( Bk ,  k,  £k)  and  define  paths  P\, ... 
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and  Qi, . . .  ,Qk+ 1  as  in  Proposition  5.4.2.  On  one  hand,  routing  one  unit  of  flow 
on  each  of  P\, . . . ,  P^  yields  a  flow  at  Nash  equilibrium  for  ( Bk ,  k ,  £k)  showing  that 
L(Bk ,  k,£k)  =  k  +  1  (as  in  Figure  5.4(a)).  On  the  other  hand,  if  H  is  the  subgraph 
obtained  from  Bk  by  deleting  all  edges  of  the  form  (ty,  uy),  routing  -dy  units  of  flow 
on  each  of  Qi,  •  . . ,  Qk+i  yields  a  flow  at  Nash  equilibrium  for  ( H ,  k ,  £k)  showing  that 
L(H ,  k,£k )  =  1  +  Mfcfi)p  (ef. ,  Figure  5.4(b)).  Thus, 

L(H ,  k,  £k)  =  1  +  HyyjY  <  1  +  ke~p/{k+1\ 

For  p  sufficiently  large,  we  may  put  k  =  |_9kTpJ  —  1  >  1  to  obtain  L(H,k,£k)  <  2 
and  L(Bk,  k ,  £k )  =  L^kdiJ  >  this  completes  the  proof.  ■ 

Remark  5.5.4  In  the  proof  of  Proposition  5.5.3,  we  have  avoided  optimizing  con¬ 
stants  for  the  sake  of  readability.  We  will  make  this  tradeoff  repeatedly  in  the  rest 
of  this  section. 

Finally,  we  extend  our  lower  bound  on  the  performance  guarantee  of  the  trivial 
algorithm  to  an  inapproximability  result.  This  task  is  more  difficult  than  in  Sec¬ 
tion  5.4;  a  crucial  part  of  the  hardness  proof  of  that  section  leveraged  the  fact  that 
general  latency  functions  can  model  edge  capacities.  This  is  not  entirely  possible 
with  low-degree  polynomials,  and  we  are  forced  instead  to  adapt  the  arguments 
of  Section  5.3  to  larger  Braess  graphs;  in  particular,  our  reduction  is  from  the  2 
Directed  Disjoint  Paths  problem  rather  than  from  Partition.  In  essence,  re¬ 
stricting  the  allowable  class  of  latency  functions  forces  us  to  encode  the  intractability 
of  an  NP-hard  problem  into  the  network  topology  of  a  network  design  instance  rather 
than  into  the  edge  latency  functions. 

In  preparation  for  the  reduction,  we  require  one  preliminary  result.  The  next 
proposition  is  a  special  case  of  a  theorem  of  Hall  [83]. 

Proposition  5.5.5  ([83])  Let  G  be  a  network  with  polynomial  latency  functions  £ 
and  a  single  source- destination  pair.  Then  L(G,  r,£)  is  a  nondecreasing  function  of 

r. 

We  can  now  prove  that  designing  networks  for  selfish  users  is  hard  in  networks 
with  polynomial  edge  latency. 

Theorem  5.5.6  There  is  a  constant  c3  >  0  so  that  the  following  statement  holds:  if 
p  >  2  and  e  >  0,  then  no  (c3 ^  —  e) -approximation  algorithm  for  Polynomial(p) 
Latency  Network  Design  exists,  unless  P  =  NP. 
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Proof.  Fix  a  sufficiently  large  integer  p,  and  put  k  =  |_16fn;J  —  1  (which  is  at  least 
1  for  large  enough  p).  For  any  e  >  0,  we  will  show  that  a  (|  —  e)-approximation 
algorithm  for  Polynomial(p)  Latency  Network  Design  enables  us  to  differ¬ 
entiate  between  “yes”  and  “no”  instances  of  the  2  Directed  Disjoint  Paths 
(2DDP)  problem  in  polynomial  time  (for  a  definition  of  2DDP,  see  the  proof  of 
Theorem  5.3.2). 

Consider  an  instance  1  =  {G,  si,  s2,  G,  t2}  of  2DDP;  we  construct  an  instance 
of  Polynomial(p)  Latency  Network  Design  (G',k,£)  as  follows  (illustrated 
in  Figure  5.6).  To  define  the  graph  G' ,  we  begin  with  k  copies  of  G;  call  them 
Gi,...,Gfc  and  denote  the  copy  of  .st  (tf)  in  Gj  by  sj  (tj).  Next,  add  auxiliary 
vertices  s,  t,  v  1,  •  •  • ,  and  wi, . . . ,  w^~ i-  The  edge  set  of  G'  is  as  follows: 

•  each  Gj  inherits  the  edge  set  of  G 

•  for  i  —  1, . . . ,  k  —  1,  we  include  edges  from  s  to  Uj,  from  vt  to  s2  and  sj+1,  from 
t2  and  fj+1  to  Wi,  and  from  Wi  to  t 

•  we  include  edges  (s,sj),  (s,s2),  (t\,t),  (t2,t). 

We  define  latency  functions  on  the  edges  of  G'  as  follows: 

(A)  for  edges  of  the  form  ( Vi,s2 )  or  (f)+1 ,  wjj),  put  £(x)  =  1 

(B)  for  edges  (s,  sj)  and  (t2,  t ),  put  £{x)  =  2  +  (1  +  j:)pxp 

(C)  for  (s,s2)  and  put  £(x)  =  1  +  k(4^^)pxp 

(D)  for  i  —  1, . . . ,  k  —  1  and  edges  (s,  vf)  and  ( Wk-i ,  t ),  put  £{x)  =  i{^^-)pxp 

(E)  for  edges  of  the  form  (uj,  s^"1"1)  or  (t2,  wf),  put  £(x)  =  2  +  (2  +  'j-)pxp 

(F)  for  edges  in  Gi, . . . ,  G^,  put  £{x)  =  0. 

We  will  call  edges  of  the  form  (vi,s2)  or  (t‘i~1,Wi)  type  A  edges,  and  so  forth,  ft  is 
clear  that  [G’ ,  k,£ )  can  be  constructed  from  X  in  polynomial  time. 

Next,  we  claim  that  if  X  is  a  “yes”  instance  of  2DDP,  then  there  is  a  subgraph 
H  of  G'  satisfying  L(H,  k,  £)  <  5.  To  see  why,  let  Pf  and  Pf  denote  vertex-disjoint 
Si-G  and  S2-t2  paths  in  G.  Deleting  all  edges  in  G'  that  lie  in  some  copy  Gj  of  G 
but  not  on  (the  corresponding  copy  of)  either  Pf  or  Pf,  we  obtain  a  subgraph  p[  of 
G'  that  is  the  union  of  2k  distinct  s-t  paths.  Routing  Gjjy  units  of  flow  on  the  path 
containing  and  t\  and  on  the  path  containing  s2  and  t2,  and  units  of  flow 
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Figure  5.6:  Proof  of  Theorem  5.5.6.  Construction  of  ( G',k,£ )  when  k  =  3.  Edges 
are  labeled  with  their  edge  type. 


on  each  of  the  other  2k  — 2  paths,  we  obtain  a  flow  at  Nash  equilibrium  for  ( H ,  k ,  £). 
This  flow  proves  that 


L(H,  k,  £) 


4  +  k 


( 4(fc  + 1)  k  y 
y  Ak  +  1  k  +  lj 


4  +  k  (l  -  -r^—X  <  4  +  ke~p/^k+1]  <  5, 

\  4k  +  lJ  ~  ~ 


with  the  picture  of  this  Nash  flow  somewhat  analogous  to  Figure  5.4(b). 

Finally,  we  show  that  if  X  is  a  “no”  instance  of  2DDP,  then  L(H,  k,£)  >  k  for  all 
subgraphs  H  of  G' .  We  will  prove  this  in  two  steps.  First,  we  will  show  that  unless 
H  contains  most  of  the  edges  in  G' ,  “capacity  considerations”  (similar  to  those  used 
in  the  proof  of  Theorem  5.4.3)  imply  that  L(H )  is  large.  Second,  we  show  that  if  H 
contains  most  of  the  edges  in  G' ,  then  the  flow  at  Nash  equilibrium  in  H  is  similar 
to  the  bad  Nash  flow  of  Proposition  5.5.3,  again  showing  L(H)  to  be  large. 

Fix  a  subgraph  H  of  G'  containing  an  s-t  path,  and  let  /  be  an  acyclic  Nash 
flow  in  ( H,k,£ )  (see  Proposition  2.6.2).  We  claim  that  if  some  type  A  or  C  edge 
of  G'  does  not  carry  flow  in  /  (in  particular,  if  some  such  edge  is  not  in  H ),  then 
L(H )  >  k.  We  will  prove  the  claim  for  an  edge  of  the  form  (vj,^);  the  argument 
for  an  edge  of  the  form  (fi+1,-uy)  is  symmetric,  and  the  argument  for  type  C  edges 
is  similar  (and  easier). 
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To  prove  this  claim,  we  first  observe  that  many  edges  of  H  are  essentially  ca¬ 
pacitated,  in  the  following  sense.  We  assert  that  any  of  the  following  events  forces 
L(H)  >  p  >  k  (using  that  L(H )  >  £e(fe)  for  any  edge  e  with  fe>  0): 


(1)  fe>  ^f$iy  for  a  type  B  edge  e 

(2)  fe  >  2{k+i)  ^or  an  edSe  e  °f  type  C  or  D 

(3)  fe  >  4Ittt  for  a  typ e  E  ed§e  e- 


For  example,  we  can  derive 


/  4{k  +  1)  2A;  +  1  V 
\  4/c  +  1  2(k  +  l)J 


1  + 


4/c  +  1 


el/(8fc+2) 


V 


=  eP/(8fc+2)  >  py 


proving  (2).  The  calculations  for  (1)  and  (3)  are  similar,  so  we  omit  them. 

Now  assume  that  edge  (vi,s2)  does  not  carry  any  flow  in  /.  Then,  either  event 

(3)  occurs  (with  edge  (ry ,  ,s)+ 1 ) )  or  else  edge  (s,Vi)  carries  at  most  units  of 

flow;  assume  the  latter.  We  claim  that  in  this  case,  event  (1)  or  event  (2)  must 
occur  with  some  edge  incident  to  s.  For  if  not,  edges  incident  to  s  carry  at  most 

2k  +  1  4k +  1  8k  +  1  8k2  +  8k  —  2 

^  ~~  l’2(k  +  1)  +  8(k  +  1)  +  8(A;  +  1)  “  8(k  +  1)  < 

units  of  flow,  contradicting  that  /  is  an  s-t  flow  carrying  k  units  of  flow.  We  conclude 
that  if  edge  (vi,sl2)  does  not  carry  flow  in  /,  then  some  event  of  the  form  (1),  (2), 
or  (3)  occurs,  proving  that  L(H )  >  k. 

It  remains  to  consider  subgraphs  H  of  G'  in  which  all  edges  of  type  A  or  C  carry 
flow  in  the  Nash  flow  /  of  (. H ,  k,£),  and  to  make  use  of  our  hypothesis  that  X  is  a 
“no”  instance  of  2DDP.  The  presence  of  these  edges  in  H  (all  of  which  lie  on  s-t 
paths  in  H,  since  they  carry  flow  in  the  acyclic  flow  /),  together  with  the  assumption 
that  X  is  a  “no”  instance,  imply  that  for  each  i  =  1,2 , ,k  there  is  an  s-t  path 
Pi  in  H  containing  the  vertices  s2  and  t\  (cf.,  the  proof  of  Theorem  5.3.2).  Letting 
r  =  <  k,  the  following  flow  is  then  at  Nash  equilibrium  for  (. H,r,£ ):  for 

i  —  1,  2, . . . ,  k  route  units  of  flow  on  Pt.  This  flow  shows  that  L(H,  r,  £)  =  k  +  3 
(this  Nash  flow  is  essentially  the  same  as  the  bad  Nash  flow  of  Proposition  5.5.3). 
By  Proposition  5.5.5,  L(H,  -,  £)  is  an  increasing  function  of  the  traffic  rate  (with  H,  £ 
fixed);  hence,  L(P[ ,  k,£)  >  k  +  3. 

We  have  shown  that  if  X  is  a  “no”  instance  of  2DDP,  then  L(H,  k,£)  >  k  for  all 
subgraphs  H  of  G',  and  the  proof  is  complete.  ■ 
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Remark  5.5.7  The  results  of  this  section  can  be  extended  to  networks  with  other 
types  of  latency  functions  with  only  minor  modifications  to  the  proofs.  For  example, 
in  Sections  A.l  and  A. 2  we  introduce  the  notion  of  the  incline  of  an  instance  (intu¬ 
itively,  a  real  number  7  >  1  that  measures  the  “steepness”  of  the  network  latency 
functions)  and  prove  that  the  price  of  anarchy  in  instances  with  incline  7  is  at  most 
7.  By  the  same  argument  as  in  Corollaries  5.3.1  and  5.5.2,  this  implies  that  the 
trivial  algorithm  is  a  7-approximation  algorithm  for  instances  with  incline  at  most 
7.  A  straightforward  modification  of  the  proof  of  Theorem  5.5.6  shows  that  there 
is  a  constant  c  >  0  such  that,  for  each  7  >  1  and  e  >  0,  there  is  no  (c  •  7  —  e)- 
approximation  algorithm  for  instances  with  incline  at  most  7  (unless  P  =  NP ).  We 
leave  the  details  of  the  proof  and  further  extensions  of  this  sort  to  the  interested 
reader. 


Chapter  6 

Stackelberg  Routing 


6.1  Introduction 

In  this  chapter,  we  pursue  a  second  approach  to  coping  with  selfishness.  In  many 
networks,  there  will  be  a  mix  of  “selfishly  controlled”  and  “centrally  controlled” 
traffic — that  is,  the  network  is  used  by  both  selfish  individuals  and  some  central 
authority.  For  example,  clients  of  a  network  may  be  charged  at  two  different  prices: 
clients  paying  the  higher  price  are  given  access  to  the  network  and  the  ability  to  route 
their  own  traffic  (presumably  along  a  minimum-latency  path),  while  clients  paying 
only  the  “bargain  rate”  can  use  the  network  but  have  no  control  over  how  their 
traffic  is  routed  (and  thus  this  traffic  qualifies  as  centrally  controlled).  Also,  Korilis 
et  al.  [105]  consider  networks  that  allow  a  large  customer  to  set  up  a  so-called  virtual 
private  network  of  guaranteed  and  preassigned  virtual  paths  for  ongoing  use  [21], 
and  argue  that  the  bandwidth  needed  for  a  virtual  private  network  may  be  viewed  as 
centrally  controlled  (with  the  paths  chosen  by  the  network  manager)  while  individual 
users  of  the  network  continue  to  behave  in  a  selfish  and  independent  fashion. 

We  investigate  the  following  question:  given  a  network  with  centrally  and  self¬ 
ishly  controlled  traffic,  how  should  centrally  controlled  traffic  be  routed  to  induce 
“good”  (albeit  selfish)  behavior  from  the  noncooperative  users?  This  indirect  ap¬ 
proach  to  controlling  selfish  behavior  has  several  appealing  aspects:  no  communi¬ 
cation  is  required  between  network  users  and  an  algorithm,  no  notion  of  currency 
is  needed  (cf.,  the  approach  of  algorithmic  mechanism  design  [136,  137,  155]),  no 
resources  need  to  be  added  to  or  removed  from  the  network,  and  the  routing  of 
centrally  controlled  traffic  is  often  easily  modified  as  the  amount  of  traffic  evolves 
over  time. 
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6.1.1  Summary  of  Results 

We  consider  a  game  in  which  the  roles  of  different  players  are  asymmetric.  One 
player  (responsible  for  routing  the  centrally  controlled  traffic  and  interested  in  min¬ 
imizing  total  latency)  acts  as  a  leader,  in  that  it  may  hold  its  routing  (its  strategy ) 
fixed  while  all  other  players  (the  followers )  react  independently  and  selfishly  to  the 
leader’s  strategy,  reaching  a  Nash  equilibrium  relative  to  it.  These  types  of  games, 
called  Stackelberg  games,  and  the  resulting  Stackelberg  equilibria  have  been  well 
studied  in  the  game  theory  literature  (see  Subsection  6.1.3  below). 

We  study  the  following  questions: 

(1)  among  all  leader  strategies  for  a  given  network,  can  we  characterize  and/or 
compute  the  strategy  inducing  the  Stackelberg  equilibrium — that  is,  the  equi¬ 
librium  of  minimum  total  latency? 

(2)  what  is  the  worst-case  ratio  between  the  total  latency  of  the  Stackelberg  equi¬ 
librium  and  that  of  the  optimal  routing  of  all  of  the  traffic? 

For  networks  with  arbitrary  edge  latency  functions  but  consisting  only  of  two 
nodes  and  a  collection  of  parallel  links,  we  give  a  simple  algorithm  for  computing 
a  leader  strategy  that  induces  an  equilibrium  with  total  latency  no  more  than  jj 
times  that  of  the  minimum- latency  flow,  where  f3  denotes  the  fraction  of  traffic  that 
is  centrally  controlled.  This  algorithm  runs  in  polynomial  time  provided  the  net¬ 
work  latency  functions  are  standard  (see  Definition  2.3.5).  We  also  show  that  no 
stronger  guarantee  is  possible  in  networks  of  parallel  links,  and  that  this  guaran¬ 
tee  cannot  be  achieved  in  general  networks.  Thus,  in  a  network  of  parallel  links,  a 
manager  controlling  a  constant  fraction  of  the  network  traffic  can  induce  an  equi¬ 
librium  with  total  latency  at  most  a  constant-factor  larger  than  that  incurred  by 
the  minimum-latency  flow.  This  result  stands  in  sharp  contrast  to  our  results  about 
Nash  equilibria,  as  the  nonlinear  variant  of  Pigou’s  example  (Subsection  2.4.4)  shows 
that  the  total  latency  of  a  flow  at  Nash  equilibrium  can  be  arbitrarily  larger  than 
that  of  a  minimum- latency  flow,  even  in  networks  of  parallel  links. 

For  networks  of  parallel  links  in  which  every  edge  latency  function  is  linear  in  the 
edge  congestion,  we  give  a  simple  algorithm  that  runs  in  0(m2)  time  and  computes 
a  strategy  inducing  an  equilibrium  with  total  latency  no  more  than  times  that 
of  the  minimum- latency  flow,  where  f3  is  the  fraction  of  centrally  controlled  traffic 
and  m  is  the  number  of  edges.  We  again  show  that  no  stronger  guarantee  is  possible. 
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Finally,  we  consider  the  optimization  problem  of  computing  the  strategy  inducing 
the  Stackelberg  equilibrium  and  show  that  it  is  NP-hard,  even  in  the  special  case  of 
networks  of  parallel  links  with  linear  latency  functions. 

6.1.2  Comparison  to  the  Price  of  Anarchy 

The  results  of  this  chapter  give  a  (sharp)  trade-off  between  a  minimum- latency  flow 
and  a  flow  at  Nash  equilibrium  (as  a  function  of  the  fraction  of  the  traffic  that  is 
centrally  controlled)  in  networks  of  parallel  links,  in  the  following  sense.  In  Chapter  3 
we  showed  that  a  flow  at  Nash  equilibrium  can  be  arbitrarily  more  costly  than  the 
minimum-latency  flow,  but  if  every  edge  latency  function  is  linear  then  the  total 
latency  of  a  Nash  flow  is  no  more  than  |  times  that  of  a  minimum- latency  flow. 
Thus,  the  results  of  this  chapter  reduce  to  those  of  Chapter  3  when  /3  —  0,  give  the 
trivial  result  that  the  Stackelberg  equilibrium  for  /3  —  1  is  the  minimum- latency  flow, 
and  quantify  the  worst  possible  ratio  between  the  cost  of  the  Stackelberg  equilibrium 
(in  some  sense,  a  “mixture”  of  a  Nash  flow  and  a  minimum-latency  flow)  and  the 
cost  of  a  minimum-latency  flow  for  all  intermediate  values  of  (3. 

Our  approach  also  adds  an  algorithmic  dimension  to  our  work  in  Part  II  bounding 
the  price  of  anarchy,  in  that  one  aspect  of  our  analysis  of  Stackelberg  equilibria  is  the 
design  of  algorithms  for  efficiently  computing  good  Stackelberg  strategies.  Further, 
while  optimal  and  Nash  flows  can  be  characterized  and  computed  efficiently  via 
convex  programming  (see  Propositions  2.3.1  and  2.5.1) — a  fact  that  was  crucial  for 
our  work  bounding  the  price  of  anarchy — the  hardness  result  of  this  chapter  implies 
that  no  such  characterization  of  Stackelberg  equilibria  is  possible.  With  the  central 
approach  of  Part  II  ruled  out,  we  will  require  new  techniques  for  bounding  the 
inefficiency  of  Stackelberg  equilibria. 

6.1.3  Related  Work 

Stackelberg  games  and  Stackelberg  equilibria  have  been  thoroughly  studied  in  the 
game  theory  literature  (see,  for  example,  [75]  or  [13,  §3.6]  for  an  introduction 
and  [184]  for  their  origin)  and  have  previously  been  applied  to  problems  in  competi¬ 
tive  facility  location  [153],  networking  [53,  54,  58,  104],  and  more  general  continuous¬ 
time  systems  (see  the  surveys  of  Cruz  [41,  42],  the  book  of  Bagchi  [12],  and  the 
references  therein).  With  the  exception  of  [53,  104],  however,  the  leader /follower 
hierarchy  has  been  used  to  model  classes  of  selfish  users  with  different  priority  lev- 
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els;  this  setting  differs  from  ours  in  that  no  user  is  interested  in  optimizing  system 
performance.1  The  note  of  Douligeris  and  Mazumdar  [53]  is  concerned  only  with 
experimental  results  on  the  effectiveness  of  Stackelberg  strategies.  The  paper  of 
Korilis  et  al.  [104],  while  more  similar  in  spirit  to  ours,  focuses  on  deriving  necessary 
and  sufficient  conditions  (on  the  number  of  selfish  users,  the  fraction  of  the  traffic 
that  is  centrally  controlled,  etc.)  for  the  existence  of  a  leader  strategy  inducing  an 
optimal  routing  of  all  of  the  traffic;  moreover,  only  one  type  of  latency  function  is 
considered.  By  contrast,  we  are  interested  in  simple  leader  strategies  that  always  in¬ 
duce  optimal  or  near-optimal  behavior  from  the  network  users  for  any  set  of  latency 
functions. 

6.1.4  Organization 

In  Section  6.2  we  extend  our  basic  traffic  model  to  accommodate  Stackelberg  equilib¬ 
ria.  In  Section  6.3  we  introduce  three  simple  algorithms  for  computing  Stackelberg 
strategies  in  networks  of  parallel  links.  In  Sections  6.4  and  6.5,  we  prove  that 
our  third  algorithm  achieves  the  best-possible  worst-case  performance  guarantee  for 
networks  of  parallel  links  with  general  and  linear  latency  functions,  respectively. 
In  Section  6.6,  we  prove  that  computing  the  optimal  strategy  is  NP-hard,  even  in 
networks  of  parallel  links  with  linear  latency  functions. 

6.2  Stackelberg  Strategies  and  Induced  Equilibria 

In  this  section  we  define  our  notion  of  a  Stackelberg  game  and  consider  two  ex¬ 
amples.  We  will  focus  on  networks  in  which  all  traffic  shares  a  common  source 
and  destination,  although  the  definitions  of  this  section  are  easily  extended  to  a 
multicommodity  setting. 

Recall  we  desire  a  hierarchical  game,  where  a  leader  routes  centrally  controlled 
traffic  and,  holding  this  strategy  fixed,  the  network  users  react  in  a  noncooperative 
and  selfish  manner.  This  idea  is  formalized  in  the  next  two  definitions.  By  a 
Stackelberg  instance  (G,r,£,/3),  we  mean  a  single-commodity  instance  (G,r,£)  in 
the  sense  of  Section  2.1  together  with  a  parameter  f3  G  [0, 1]  specifying  the  fraction 
of  the  network  traffic  that  is  centrally  controlled. 

1  Traditionally,  Stackelberg  games  model  selfish  users  with  asymmetric  roles;  our  use  of  them  is 
somewhat  unconventional. 
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Definition  6.2.1  A  (Stackelberg)  strategy  for  the  Stackelberg  instance  (G,r,£,/3)  is 
a  flow  feasible  for  ( G,j3r,£ ). 

Definition  6.2.2  Let  /  be  a  strategy  for  Stackelberg  instance  (G,  r,  £,  /3),  and  define 
£e  by  £e(x)  =  £e(fe  +  x)  for  each  edge  e  e  E.  An  equilibrium  induced  by  strategy  f 
is  a  flow  g  at  Nash  equilibrium  for  the  instance  (G,  (1  —  f3)r,£ ).  We  then  say  that 
f  +  g  is  a  flow  induced  by  f  for  (G,  r,  £,  ft). 

Existence  and  essential  uniqueness  of  induced  equilibria  follow  easily  from  Propo¬ 
sitions  2.2.4  and  2.5.1. 

Proposition  6.2.3  Let  f  be  a  strategy  for  a  Stackelberg  instance  with  continuous, 
nondecreasing  latency  functions.  Then  there  exists  a  flow  induced  by  f,  and  any  two 
such  induced  flows  have  equal  cost. 

The  following  simple  observation  will  be  useful  in  Sections  6.4  and  6.5. 

Lemma  6.2.4  Let  f  be  a  strategy  for  Stackelberg  instance  (■ G,r,£,/3 )  inducing  equi¬ 
librium  g,  where  G  is  a  network  of  parallel  links.  Let  G'  denote  the  subgraph  induced 
by  the  edges  e  on  which  ge  >  0.  Then  f  +  g,  restricted  to  the  edges  of  G' ,  is  a 
flow  at  Nash  equilibrium  for  the  instance  (G',r',£),  where  r'  =  )CeeG'(/e  +  9e)-  In 
particular,  all  edges  e  on  which  ge  >  0  have  a  common  latency  with  respect  to  f  +  g . 

We  next  consider  two  examples  that  demonstrate  both  the  usefulness  and  the  lim¬ 
itations  of  Stackelberg  strategies.  First  consider  Pigou’s  example  (Subsection  1.2.1). 
Recall  that  in  the  absence  of  centrally  controlled  traffic,  a  Nash  flow  incurs  total 
latency  |  times  that  of  the  optimal  flow.  Suppose  instead  that  half  of  the  traffic 
is  controlled  by  the  network  manager  (i.e.,  that  (3  —  |)  and  consider  the  strategy 
/  of  routing  all  centrally  controlled  traffic  on  the  top  edge  (the  edge  with  latency 
function  £(x)  =  1).  Then,  as  all  remaining  traffic  will  be  routed  on  the  lower  edge 
in  the  equilibrium  induced  by  /,  the  flow  induced  by  /  is  precisely  the  minimum- 
latency  flow.  Thus,  in  this  particular  instance,  total  latency  can  be  minimized  via 
a  Stackelberg  strategy. 

Now  consider  a  small  modification  to  Pigou’s  example,  in  which  we  replace  the 
latency  function  of  the  lower  edge  with  the  latency  function  £(x)  =  2x.  The  flow 
at  Nash  equilibrium  puts  half  of  the  traffic  on  each  edge  for  a  cost  of  1,  while  the 
optimal  flow  routes  only  |  of  the  traffic  on  the  lower  edge,  for  a  cost  of  |.  On  the 
other  hand,  if  we  again  allow  the  network  manager  to  route  half  of  the  traffic,  we 


109 


see  that  for  any  strategy  /,  the  flow  induced  by  /  is  the  flow  at  Nash  equilibrium 
and  hence  is  not  optimal.  In  this  example,  there  is  no  available  strategy  by  which 
the  network  manager  can  improve  network  performance. 

6.3  Three  Stackelberg  Strategies 

6.3.1  Two  Natural  Strategies 

We  begin  our  investigation  of  Stackelberg  strategies  for  networks  of  parallel  links  by 
considering  two  natural  approaches  that  provide  suboptimal  performance  guaran¬ 
tees.  To  motivate  our  results  in  the  simplest  possible  way,  throughout  this  subsection 
we  will  consider  examples  with  linear  latency  functions  in  which  half  of  the  traffic 
is  centrally  controlled  (/3  —  |). 

First  consider  the  following  strategy  for  an  instance  (G,r,  l,  |):  if  f*  is  the 
minimum-latency  flow  for  instance  (G,  |r,  £),  put  f  —  f* ■  In  words,  we  choose 
the  strategy  of  minimum  cost  (ignoring  the  existence  of  traffic  that  is  not  centrally 
controlled).  We  call  this  the  Aloof  strategy  since  it  refuses  to  acknowledge  the  rest 
of  the  traffic  in  the  network.  Pigou’s  example  (Subsection  1.2.1)  shows  that  this 
strategy  performs  quite  poorly:  in  that  network,  the  Aloof  strategy  routes  all  flow 
on  the  bottom  edge  (the  edge  with  latency  function  i(x)  =  x )  and  in  the  induced 
flow  all  traffic  is  routed  on  the  bottom  edge.  Thus,  the  Aloof  strategy  induces  the 
(inefficient)  Nash  flow  while  the  strategy  that  routes  all  centrally  controlled  traffic 
on  the  top  edge  induces  the  optimal  flow.  Applying  this  type  of  argument  to  the 
nonlinear  variant  of  Pigou’s  example  (Subsection  2.4.4),  it  is  also  easy  to  see  that 
the  Aloof  strategy  can  perform  arbitrarily  badly  in  networks  with  general  latency 
functions. 

A  second  attempt  at  a  good  strategy  might  be  as  follows:  if  /*  is  the  minimum- 
latency  flow  for  (G,r,£),  put  /  =  \f*.  We  call  this  the  Scale  strategy,  since  it  is 
simply  the  optimal  flow,  suitably  scaled.  To  understand  why  we  should  be  dissatis¬ 
fied  with  the  Scale  strategy,  consider  the  two-node,  two-link  example  with  latency 
functions  £\{x)  =  1  and  and  traffic  rate  1.  The  minimum- latency  flow 

routes  |  of  the  flow  on  the  first  link  and  the  rest  on  the  second  link  (with  total  cost 
|),  so  the  Scale  strategy  will  route  |  units  of  flow  on  the  first  link  and  |  units  on 
the  second  link.  All  selfish  traffic  is  then  routed  on  the  second  link,  inducing  the 
flow  with  |  units  of  flow  on  the  first  link  and  the  rest  on  the  second,  having  cost 


110 


1.  It  would  appear  that  the  Scale  strategy  routed  too  much  flow  on  the  second  link 
(since  selfish  traffic  flocked  to  it,  anyways);  indeed,  the  strategy  that  instead  routes 
all  centrally  controlled  traffic  on  the  first  link  induces  a  flow  with  the  superior  cost 
of  T-  2 

Ui  g. 

6.3.2  The  Largest  Latency  First  (LLF)  Strategy 

Intuitively,  both  the  Aloof  and  Scale  strategies  suffer  from  a  common  flaw:  both 
route  traffic  on  edges  that  will  subsequently  be  inundated  in  any  induced  equilibrium 
while  routing  too  little  traffic  on  edges  that  selfish  users  are  prone  to  ignore.  This 
observation  suggests  that  a  good  strategy  should  give  priority  to  the  edges  that 
are  least  appealing  to  selfish  users — edges  with  relatively  high  latency.  With  this 
intuition  in  mind,  the  following  strategy  for  a  Stackelberg  instance  (G,  r,  £,  (3)  defined 
on  a  network  of  m  parallel  links  (which  we  call  the  Largest  Latency  First  or  LLF 
strategy)  should  seem  natural: 

(1)  Compute  a  minimum- latency  flow  f*  for  (G,r,£). 

(2)  Label  the  edges  of  G  from  1  to  m  so  that  i\ (/*)  <  •  •  •  <  £m(f*n)- 

(3)  Let  k  <  m  be  minimal  with  YULk+i  fi  —  Pr- 

(4)  Put  fi  =  f*  for  i  >  k,  fk  =  (3r  -  YT=k+ i  /*»  and  /»  =  0  for  i  <  k. 

We  will  say  that  an  edge  i  is  saturated  by  a  strategy  /  if  f,  —  f*.  Thus,  the  LLF 
strategy  saturates  edges  one-by-one  (in  order  from  the  largest  latency  with  respect  to 
/*  to  the  smallest)  until  there  is  no  centrally  controlled  traffic  remaining.  Note  that 
as  long  as  all  link  latency  functions  are  standard  (see  Definition  2.3.5),  Fact  2.3.6 
implies  that  the  LLF  strategy  can  be  computed  in  polynomial  time  (the  bottleneck 
is  step  (1));  in  Section  6.5  we  will  see  that  it  can  be  computed  in  0(m2)  time  when 
every  latency  function  is  linear.3 

The  next  two  sections  are  devoted  to  proving  that  the  LLF  strategy  always 
induces  a  flow  with  near-optimal  total  latency. 

2In  addition,  a  slightly  more  complicated  example  shows  that  the  Scale  strategy  can  perform 
arbitrarily  badly  in  networks  with  general  latency  functions. 

3We  ignore  the  fact  that  an  optimal  flow  can  only  be  computed  up  to  an  arbitrarily  small 
additive  factor  (see  Fact  2.3.6),  as  this  does  not  affect  our  results  in  any  significant  way. 
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6.4  Arbitrary  Latency  Functions:  A  Performance 
Guarantee  of  1  / j3 

In  this  section  we  prove  that  the  LLF  strategy  induces  a  near-optimal  flow  for  net¬ 
works  of  parallel  links  with  arbitrary  latency  functions.  We  note  that  no  performance 
guarantee  is  possible  for  such  networks  in  the  absence  of  centrally  controlled  traffic: 
without  additional  restrictions  on  edge  latency  functions,  the  Nash  flow  may  incur 
arbitrarily  more  latency  than  the  optimal  flow  (as  shown  by  the  nonlinear  variant 
of  Pigou’s  example).  Thus,  the  benefit  of  a  leader  (and  of  a  carefully  chosen  leader 
strategy)  is  particularly  striking  in  this  general  setting. 

A  simple  variation  on  previous  examples  demonstrates  the  limits  of  Stackelberg 
strategies.  In  a  two-node,  two-link  instance  with  j3  —  \  and  latency  functions 
t\{x)  =  1  and  £2( x )  =  2paT  for  p  e  Z+ ,  any  Stackelberg  strategy  induces  the  Nash 
flow  (half  of  the  flow  on  each  link,  with  cost  1)  while  the  minimum- latency  flow 
routes  |  +  Sp  units  of  flow  on  the  first  link  and  the  rest  on  the  second  link,  for  a  cost 
of  |  +  ep  with  Sp,  ep  — >  0  as  p  — »  oo.  Thus  the  best  induced  flow  may  be  (arbitrarily 
close  to)  twice  as  costly  as  the  optimal  flow.  Similar  examples  show  that  for  any 
/ 3  G  (0, 1),  the  best  induced  flow  may  be  ^  times  as  costly  as  the  optimal  flow. 

The  main  result  of  this  section  is  that  the  LLF  strategy  always  induces  a  flow 
of  cost  no  more  than  jj  times  that  of  the  minimum-latency  flow.  A  rough  outline 
of  the  proof  is  as  follows.  Our  goal  is  to  exploit  the  iterative  structure  of  the  LLF 
strategy  and  proceed  by  induction  on  the  number  of  edges.  If  the  LLF  strategy 
first  saturates  the  mth  edge  (with  the  ordering  of  edges  as  in  the  description  of  the 
LLF  strategy),  a  natural  idea  is  to  apply  the  inductive  hypothesis  to  the  remainder 
of  the  LLF  strategy  on  the  first  m  —  1  edges  to  derive  a  performance  guarantee. 
This  idea  nearly  succeeds,  but  there  are  two  difficulties.  First,  it  is  possible  that 
the  LLF  strategy  fails  to  saturate  any  edges;  we  will  see  below  that  this  case  is  easy 
to  analyze  and  causes  no  trouble.  Second,  in  order  to  obtain  a  clean  application 
of  the  inductive  hypothesis  to  the  first  m  —  1  edges,  we  require  that  the  optimal 
and  LLF-induced  flows  route  the  same  total  amount  of  flow  on  these  edges — i.e., 
that  the  LLF-induced  equilibrium  eschews  the  mth  edge.4  We  resolve  this  difficulty 
with  the  following  lemma,  which  states  that  if  the  LLF  strategy  saturates  the  mth 
edge,  then  some  induced  equilibrium  routes  all  traffic  on  the  first  m  —  1  edges — this 

4This  can  fail  in  trivial  cases,  such  as  in  a  two-node,  two-link  network  in  which  both  edges  have 
latency  function  £(x)  =  1. 
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suffices  for  our  purposes,  since  different  induced  flows  have  equal  cost. 

Lemma  6.4.1  Let  (G,r,£,(3)  denote  a  Stackelberg  instance  on  a  network  of  m  par¬ 
allel  links  with  optimal  flow  f* ,  and  label  the  edges  of  G  from  1  to  m  so  that 
£m{fm)  >  ti(fi)  for  al1  i  e  {1,  2, . . . ,  m}.  If  f  is  a  strategy  with  fm  =  ffl,  then 
there  exists  an  induced  equilibrium  g  with  gm  =  0. 

Proof.  Consider  an  arbitrary  induced  equilibrium  g  and  suppose  gm  >  0.  Roughly 
speaking,  the  idea  is  to  prove  that  this  scenario  only  occurs  when  several  latency 
functions  (that  of  the  mth  edge,  and  others)  arc  locally  constant;  then,  traffic  routed 
on  edge  m  in  the  induced  equilibrium  can  be  evacuated  to  other  edges  with  locally 
constant  latency  functions  to  provide  a  new  induced  equilibrium. 

Formally,  let  L  =  £m(fm  +  gm)  =  I  miff  +  gm)  denote  the  common  latency  with 
respect  to  /  +  g  of  every  edge  with  gi  >  0  (see  Lemma  6.2.4).  We  must  have 
Imi.f'm)  —  L',  otherwise  Iff*)  <  L  for  all  i  yet  If  f),  +  gf)  >  L  for  all  i,  contradicting 
that  f*  and  f  +  g  are  flows  at  the  same  rate.  Thus,  since  £m  is  nondecreasing,  £m  is 
locally  constant:  £m  is  equal  to  L  on  [/'*, ,  ff  +  gm]. 

Next,  let  E'  denote  the  subgraph  of  edges  on  which  fi  +  gt  <  f* ;  since  fm+gm  > 
ff,  E'  is  non-empty.  For  i  e  E',  we  have  Iff  +  gf)  >  L,  £flf*)  <  £m{ff)  =  L, 
and  £i  nondecreasing;  hence,  £i  is  equal  to  L  on  [fl  +  gt ,  f* ] .  Since  /*  and  f  +  g  are 
flows  at  the  same  rate,  we  must  have  XaeE'l/?  —  (/*  +  gf)]  >  gm ■  Finally,  consider 
modifying  g  as  follows:  move  all  traffic  previously  routed  on  edge  m  to  edges  in 
E subject  to  the  constraint  /*  +  gi  <  f*.  We  have  already  observed  that  there  is 
sufficient  “room”  on  edges  in  E'  for  this  operation,  and  that  all  latency  functions  are 
constant  in  the  domain  of  our  modifications.  We  have  thus  exhibited  a  new  induced 
equilibrium  with  no  traffic  routed  on  edge  m,  completing  the  proof.  ■ 

We  are  now  prepared  to  prove  the  main  result  of  this  section. 

Theorem  6.4.2  Let  1  =  ( G,r,£,(3 )  denote  a  Stackelberg  instance  on  a  network  of 
parallel  links.  If  f  is  an  LLF  strategy  for  X  inducing  equilibrium  g  mid  f*  is  a 
minimum-latency  flow  for  the  instance  (G,  r,£),  then  C(f  +  g)  <  4C'(/*). 

Proof.  We  proceed  by  induction  on  the  number  of  edges  m  (for  each  fixed  m,  we 
will  prove  the  theorem  for  arbitrary  £,  r,  and  (3).  The  case  of  a  single  edge  is  trivial. 

Fix  a  Stackelberg  instance  X  =  (G,  r,£,/3)  with  at  least  two  edges,  and  let  f* 
denote  a  minimum- latency  flow  for  the  instance  (G,  r,  £)  and  /  the  corresponding 
LLF  strategy.  Label  the  edges  1  to  m  so  that  £\ (./)*)  <  Liiffl)  <  •••  <  £m(ff). 
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By  scaling,  we  may  assume  that  r  =  1  (use  latency  functions  £  with  Ipx)  =  G(rx) 
otherwise).  Let  L  denote  the  common  latency  with  respect  to  /  +  g  of  every  edge 
with  g.i  >  0  (see  Lemma  6.2.4). 

Case  1:  Suppose  —  0  for  some  edge  k.  Let  Ex  denote  the  edges  i  for  which 
gi  =  0  and  E2  the  edges  for  which  g*  >  0;  both  of  these  sets  are  non-empty.  Let 
G\  and  G2  denote  the  corresponding  subgraphs  of  G.  For  j  —  1,2  let  /3j  denote  the 
amount  of  centrally  controlled  traffic  routed  on  edges  in  Ej  and  Cj  the  cost  incurred 
by  f  +  g  on  edges  in  Ej.  By  Lemma  6.2.4,  C2  —  (1  —  Pi)L  and  C\  >  /3iL.  Now,  /* 
restricted  to  E2  is  an  optimal  flow  for  (C2, 1  —  Pi)  and  hence  /  restricted  to  E2  is  an 
LLF  strategy  for  the  instance  Z2  =  (C2, 1  —  Pi,  P')  where  p'  =  jz^r-  Applying  the 
inductive  hypothesis  to  J2  and  using  the  fact  that  f*  >  fi  —  fi  +  gi  for  all  i  e  E\, 
we  obtain 

C{f*)>Ci  +  pC2. 

Proving  that  C(f  +  g)  <  ^ C(f *)  thus  reduces  to  showing 

P(C!  +  C2 )  <  Ci  +  p'c2. 

Since  (3  <  1  and  Ci  >  fti L,  it  suffices  to  prove  this  inequality  with  Ci  replaced  by 
Pi L.  Writing  C2  =  (1  —  Pi)L  and  0'  =  j— l_  and  dividing  through  by  L,  we  need 
only  check  that 

P(Pi  +  (1  ~  Pi))  <  Pi  +  y ~ (1  —  Pi) 

which  clearly  holds  (both  sides  are  equal  to  P). 

Case  2:  Suppose  gt  >  0  for  every  edge  i,  so  C(f  +  g)  =  L.  We  may  assume  that 
the  LLF  strategy  failed  to  saturate  edge  m  (otherwise,  by  Proposition  6.2.3,  we  can 
finish  by  applying  the  previous  case  to  the  better-behaved  induced  flow  guaranteed 
by  Lemma  6.4.1).  Thus,  P  <  f*n. 

As  in  the  proof  of  Lemma  6.4.1,  we  must  have  £m(f0)  >  L;  otherwise,  A (./)*)  <  L 
for  all  edges  i  while  £l(fi  +  gp  =  L  for  all  i,  contradicting  that  /*  and  f  +  g  are  flows 
at  the  same  rate.  Having  established  that  edge  m  has  large  latency  with  respect  to 
f*  and  that  /*,  is  fairly  large,  it  is  now  a  simple  matter  to  lower  bound  C(f*): 


c(n>fLua>pL  =  pc(f+g). 
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Remark  6.4.3  Theorem  6.4.2  applies  only  to  networks  of  parallel  links,  and  the 
proof  makes  crucial  use  of  the  “decomposable”  nature  of  such  networks.  On  the 
other  hand,  the  guarantee  of  Theorem  6.4.2  cannot  be  extended  to  general  network 
topologies  (cf.,  our  work  in  Sections  3. 3-3. 4);  even  in  the  four-node  network  of 
Braess’s  Paradox  (Figure  2.2),  a  network  manager  controlling  a  (3  fraction  of  the 
traffic  cannot  in  general  induce  a  flow  with  cost  at  most  a  ^  factor  times  that  of  the 
minimum-latency  flow  (see  Section  B.3). 

6.5  Linear  Latency  Functions:  A  Performance 
Guarantee  of  4/(3  +  P) 

6.5.1  Properties  of  the  Nash  and  Optimal  Flows 

In  this  subsection  we  undertake  a  deeper  study  of  Nash  and  optimal  flows  for  in¬ 
stances  on  networks  of  parallel  links  with  linear  latency  functions.  The  results  of 
this  subsection  will  be  instrumental  in  proving  a  stronger  performance  guarantee  for 
the  LLF  strategy  for  these  instances. 

Fix  a  network  G  of  m  parallel  links  with  linear  latency  functions  (so  that  £e(x)  = 
aex  +  be  for  each  e  G  E  with  ae,  be  >  0)  and  label  them  from  1  to  m  so  that 
b\  <  b-2  <  ■  ■  ■  <  bm.  We  may  assume  that  at  most  one  edge  has  a  constant  latency 
function  (a*  =  0)  since  all  but  the  one  with  smallest  6-value  can  be  safely  discarded; 
under  this  assumption,  the  Nash  and  optimal  flows  are  always  unique.  We  may 
similarly  assume  that  an  edge  with  a  constant  latency  function  is  the  mth  edge. 

Our  first  goal  is  to  understand  the  structure  of  the  Nash  flow  /  as  a  function 
of  the  rate  r.  It  is  useful  to  imagine  r  increasing  from  0  to  a  large  value,  with  the 
corresponding  Nash  flow  changing  in  a  continuous  fashion;  an  intuitive  description 
of  this  process  is  as  follows.  Initially,  when  r  is  nearly  zero,  all  traffic  is  routed  on  the 
edge  having  the  smallest  constant  term.  Once  the  first  edge  is  sufficiently  congested, 
the  second  edge  looks  equally  attractive  (this  occurs  when  aifi  +  bi  =  b2 — i.e.,  when 
the  amount  of  flow  on  the  first  edge  is  b 2~bl ) .  Subsequent  traffic  will  be  routed  on 
both  of  the  first  two  edges,  at  rates  proportional  to  /-  and  /-  (traffic  will  be  routed 
so  that  these  two  edges  continue  to  have  equal  latency).  Once  (63  —  62 ) ( /-  +  ^) 
further  units  of  traffic  have  been  routed  on  the  first  two  edges,  the  third  edge  will  be 
equally  attractive  and  new  traffic  will  be  spread  out  among  the  first  three  edges,  and 
so  on.  We  may  thus  envision  the  Nash  flow  as  being  constructed  in  phases:  within 
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phase  i  traffic  is  routed  on  the  first  i  edges  according  to  hxed  relative  proportions 
and  at  the  end  of  the  phase  (after  enough  new  flow  has  been  routed)  an  additional 
edge  is  put  into  use. 

We  now  formalize  this  intuitive  description  of  the  Nash  flow  /.  For  %  =  1, . . . ,  m, 
let  Vi  denote  the  m-vector  (d-,  d-, . . . ,  d-,  0,  0, . . . ,  0)  G  1Z™,  if  am  =  0  put  vm  = 
(0,  0, . . . ,  1).  The  vector  Vi  should  be  interpreted  as  a  specification  of  the  way  traffic  is 
routed  on  the  first  i  edges  during  the  ith  phase.  Next,  define  Si  for  i  —  0, 1, . . . ,  m—  1 
inductively  by  <50  =  0  and  Si  =  min{(6i+1  —  &i)||i’i||1,  J'  —  >  0  (where  H-^ 

denotes  the  L ] -norm  of  a  vector).  We  also  put  Sm  —  r  —  ffrfX0]  $i-  The  scalar  Si 
should  be  interpreted  as  the  total  amount  of  traffic  routed  in  the  ith  phase.  We  can 
then  describe  /  as  follows  (where  we  interpret  the  m-vector  /  as  a  function  on  the 
edges  of  G  in  the  obvious  way). 


Lemma  6.5.1  Let  X  be  an  instance  on  a  network  of  m  parallel  links  with  linear 
latency  functions,  as  above.  Then  the  Nash  flow  for  X  is  given  by 


/  =  £* 


2—1 


^111 


Our  characterization  of  optimal  flows  (Proposition  2.3.1  and  Corollary  2.3.2) 
yields  an  analogous  result  for  computing  them  by  an  explicit  formula.  Note  that 
when  a  latency  function  has  the  form  £{x)  =  ax  +  b,  the  corresponding  marginal  cost 
function  (see  Section  2.3)  is  £*{x)  =  2 ax  +  b.  Recalling  from  Corollary  2.3.2  that  an 
optimal  flow  is  simply  a  flow  at  Nash  equilibrium  with  respect  to  latency  functions 
£*,  we  see  that  the  optimal  flow  is  created  by  the  same  process  as  the  Nash  flow, 
except  that  new  edges  are  incorporated  at  a  more  rapid  pace  so  as  to  spread  traffic 
over  a  wider  range  of  edges  (and  thus  achieve  a  smaller  total  latency). 

Formally,  let  v,  be  as  above  and  define  S*  inductively  by  Sq  =  0,  S*  =  niin{|(6i+1  — 
) 1 1 Vi ||i,  r — X) )'=o  S* } ,  and  S*n  =  r—Jf^o1  4*-  Letting  f*  denote  the  minimum- latency 
flow  for  ( G,r,£ ),  the  analogue  of  Lemma  6.5.1  is  as  follows. 


Lemma  6.5.2  Let  X  be  an  instance  on  a  network  of  m  parallel  links  with  linear 
latency  functions,  as  above.  Then  the  optimal  flow  for  X  is  given  by 


/•  =  £  s; 


i=  1 


Lemmas  6.5.1  and  6.5.2  have  several  useful  corollaries.  We  summarize  them 


below. 
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Corollary  6.5.3  Let  G  be  a  network  of  m  parallel  links  with  linear  latency  func¬ 
tions,  with  at  most  one  edge  having  constant  latency.  Label  the  edges  of  G  from  1 
to  m  so  that  if  Ifx)  =  a*a;  +  b, .  then  b,  is  nondecreasing  in  i.  Then: 

(a)  If  f*  and  f  denote  optimal  and  Nash  flows  for  ( G,r,£ ),  then  ffn  >  fm. 

(b)  If  f*  and  f  denote  optimal  and  Nash  flows  for  ( G,r,t ),  then  f*  <  fi  <  2f*. 

(c)  If  f*  is  the  optimal  flow  for  (G,r,£)  and  f*  is  the  optimal  flow  for  ( G,f,£ ) 
with  r  >  r,  then  f*  >  f*  for  each  edge  i. 

(d)  For  any  rate  r,  the  optimal  and  Nash  flows  for  ( G,r,£ )  can  be  computed  in 
0(m2)  time. 


Proof.  Parts  (c)  and  (d)  are  immediate  from  Lemmas  6.5.1  and  6.5.2.  For  the 
remaining  parts,  fix  an  instance  (G,  r,  £)  and  define  Vi,  Si,  and  8*  as  in  Lemmas  6.5.1 
and  6.5.2.  Our  first  observation  is  that,  for  any  i  G  {1,2,  ...pm],  the  Nash  flow 
routes  at  least  as  much  traffic  in  the  first  i  phases  as  the  optimal  flow — formally, 
that  Yfk= l  4  >  Y?k= l  4  f°r  Since  J2T=i  4  =  Yfk= l  4  —  r>  we  obtain  S*n  >  Sm 
and  hence  f  *m  >  fm,  proving  (a). 

It  remains  to  prove  part  (b)  of  the  corollary.  We  may  assume  that  m  >  1 
(otherwise  f\  =  /*  =  r).  Letting  ml  equal  m  if  there  is  no  edge  with  constant 
latency  function  and  m  —  1  otherwise,  Lemmas  6.5.1  and  6.5.2  give 


1  m  8* 

/r  =  -Eri 

«i  IPil 


and 


-|  m 
0,1  i=  1 


8i 


By  the  dehnitions  of  5*  and  8*,  we  have  8 \  <  28*  for  i  =  1,2 and  hence 
fi  <  2 /{.  For  the  other  inequality,  we  recall  that  Y?k= l  4  —  Z)fe=i  4  f°r  each  i  and 
observe  that  1 1 -u* 1 1 x  is  increasing  in  i  (for  i  G  {1,2, .. . ,  m'});  it  follows  that  /*  <  f\. 


Corollary  6.5.3(d)  implies  that  the  LLF  strategy  can  be  computed  in  0(m2)  time 
for  instances  with  linear  latency  functions. 
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6.5.2  Proof  of  Performance  Guarantee 

In  Section  6.2  we  saw  an  example  with  linear  latency  functions  and  ft  —  |  in  which 
no  strategy  can  induce  a  flow  with  cost  less  than  |  times  that  of  the  minimum- 
latency  flow.  This  example  is  easily  modified  (by  giving  the  second  edge  the  latency 
function  £2(:r)  =  yya3')  to  show  that,  for  any  ft  G  (0, 1),  the  minimum-cost  induced 
flow  for  a  Stackelberg  instance  (G,  r,  £,  /3)  with  linear  latency  functions  may  be 
times  as  costly  as  the  minimum- latency  flow  for  (G,r,£).  The  main  result  of  this 
section  is  a  matching  upper  bound  for  the  LLF  strategy. 

Before  proving  this  result,  we  give  an  alternative  description  of  LLF  that  is  more 
convenient  for  our  analysis.  This  description  is  based  on  the  following  lemma. 

Lemma  6.5.4  Let  f*  be  an  optimal  flow  for  instance  ( G,r,£ )  defined  on  a  network 
G  of  parallel  links  with  £e(x)  =  aex  +  be  for  each  edge  e.  Let  i  and  j  be  two  edges  of 
G.  Then  £flf*)  >  £j(ff)  if  and  only  if  bi  >  bj. 

Proof.  The  lemma  is  clear  when  f  *  =  /*  =  0.  Otherwise,  we  will  make  use  of  our 
characterization  of  optimal  flows  via  marginal  cost  functions  (Proposition  2.3.1). 
If  exactly  one  of  f* ,  f*  is  0  (say  /*),  then  by  Proposition  2.3.1  we  know  that  the 
marginal  cost  £* (/* )  =  =  bt  of  edge  i  is  at  least  the  marginal  cost  £*(/*)  = 

2 ajf*  +  bj  of  edge  j.  Thus  we  necessarily  have  both  £flff)  >  £j(ff)  and  6,  >  bj. 
Finally,  if  /*,  f*  >  0  then  by  Proposition  2.3.1  we  have  ‘lafl*  +  5*  =  2 ajf*  +  bj  =  L* 
for  some  L*\  thus  6,  >  bj  if  and  only  if  a,f*  <  ajf* .  The  lemma  follows  by  writing 
M./V)  /.'  -  a ;f;"  and  /.,(//  )  —  L*  —  a3f*.  m 

Lemma  6.5.4  gives  the  following  equivalent  description  of  the  LLF  strategy:  sat¬ 
urate  edges  one-by-one,  in  decreasing  order  of  constant  terms,  until  no  centrally  con¬ 
trolled  traffic  remains.  It  may  seem  surprising  that  the  LLF  strategy  makes  no  use  of 
the  dj-values  in  ordering  the  edges;  however,  this  is  consistent  with  our  observation 
in  Subsection  6.5.1  that  the  order  in  which  edges  are  used  by  the  minimum-latency 
flow  (if  we  think  of  the  rate  as  increasing  from  0  to  some  large  value)  depends  only 
on  the  constant  terms  of  the  edges’  latency  functions. 

We  are  finally  prepared  to  prove  a  performance  guarantee  for  the  LLF  strat¬ 
egy  for  networks  of  parallel  links  with  linear  latency  functions.  The  general  approach 
is  similar  to  that  of  Theorem  6.4.2  and  is  again  by  induction  on  the  number  of  edges. 
However,  new  difficulties  arise  in  proving  a  stronger  performance  guarantee.  The 
case  in  which  there  is  some  edge  k  on  which  the  induced  equilibrium  routes  no  flow 
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(i.e.,  (jk  =  0  for  some  edge  k)  is  nearly  identical  to  the  first  case  of  Theorem  6.4.2, 
and  the  desired  performance  guarantee  can  easily  be  extracted  from  the  inductive 
guarantee  for  the  smaller  instance  induced  by  the  edges  on  which  g*  >  0.  The  sec¬ 
ond  case,  where  the  induced  equilibrium  routes  traffic  on  all  edges,  is  substantially 
more  complicated.  In  particular,  the  simple  approach  in  the  proof  of  Theorem  6.4.2 
does  not  use  any  inductive  guarantee  in  this  case  and  is  thus  not  strong  enough  to 
prove  a  guarantee  better  than  4.  For  this  reason,  much  of  the  proof  is  devoted  to 
defining  an  appropriate  smaller  instance  that  allows  for  clean  application  of  the  in¬ 
ductive  hypothesis  and  to  extending  the  inductive  guarantee  into  one  for  the  original 
instance. 


Theorem  6.5.5  Let  X  =  ( G,r,£,(3 )  denote  a  Stackelberg  instance  on  a  network  of 
m  parallel  links  with  linear  latency  functions.  If  f  is  an  LLF  strategy  for  X  inducing 
equilibrium  g  and  f*  is  a  minimum-latency  flow  for  (G,r,£),  then  C(f  +  g)  < 


4 

3+/3 


Proof.  We  proceed  by  induction  on  the  number  of  edges  m;  for  each  fixed  m,  we 
will  prove  the  theorem  for  arbitrary  (linear)  £,  r,  and  /3.  The  case  of  one  edge  is 
trivial. 

Fix  a  Stackelberg  instance  X  =  (G,  r,£,/3)  with  at  least  two  edges,  with  edges 
labeled  1  to  m  in  the  order  that  they  are  considered  by  the  LLF  strategy;  by 
Lemma  6.5.4,  we  may  write  =  a*a;  +  bi  with  bi  nondecreasing  in  i.  Let  f* 

denote  a  minimum-latency  flow  for  (G,r,  £).  We  begin  with  several  simplifying 
assumptions,  each  made  with  no  loss  of  generality.  As  in  Theorem  6.4.2,  we  may 
assume  that  r  —  1.  We  assume  (as  usual)  that  there  is  at  most  one  edge  with  a 
constant  latency  function.  It  will  also  be  convenient  to  assume  that  the  first  edge 
has  constant  term  0  (i.e.,  that  bi  =  0).  To  enforce  this  assumption  we  may  subtract 
b\  from  every  latency  function  before  applying  our  argument:  assuming  r  =  1, 
this  modification  decreases  the  cost  of  every  feasible  flow  by  precisely  b\  and  only 
increases  the  ratio  in  costs  between  any  two  feasible  flows.  Finally,  we  assume  that 
Gp  >  0;  otherwise,  the  first  edge  has  latency  function  £{x)  —  0  and  the  instance  is 
trivial. 

Let  /  denote  an  LLF  strategy  for  X  and  g  the  induced  equilibrium.  Let  L  >  0 
denote  the  common  latency  of  every  edge  i  on  which  g*  >  0  (see  Lemma  6.2.4).  We 
will  need  to  apply  the  inductive  hypothesis  in  two  different  ways,  and  our  analysis 
breaks  into  two  cases. 
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Case  1:  Suppose  g &  =  0  for  some  edge  k.  As  in  the  proof  of  Theorem  6.4.2,  let  E\ 
denote  the  edges  on  which  g*  =  0  and  E2  the  edges  on  which  g,  >  0.  Let  G\  and  G2 
denote  the  corresponding  (non-empty)  subgraphs  of  G.  For  j  =  1,  2,  let  (3j  denote 
the  amount  of  centrally  controlled  traffic  routed  on  edges  in  Ej.  For  j  =  1,2  let 
Cj  denote  the  cost  incurred  by  /  +  g  on  edges  in  Ej.  Observe  that  C\>  (3\L  and 
C2  —  (1  —  /3i)L.  Since  f*  restricted  to  E2  is  an  optimal  flow  for  {G 2, 1  —  (3i,£),  f 
restricted  to  E2  is  an  LLF  strategy  for  12  =  (G 2, 1  —  (3\,  £,  /?'),  where  f3'  =  The 
inductive  hypothesis  (applied  to  X2)  and  the  fact  that  f*  >  f%  —  fi  +  gt  for  all  i  e  E± 
imply  that 

a  _J_  ft 

C(D  >Ci+  C2. 

Proving  that  C(f  +  g)  <  3 +pC(f*)  thus  reduces  to  showing 

(3  +  (5'){Ci  +  C2 )  <  4Ci  +  (3  +  /3')C2. 

Since  (3  <  1  and  C\  >  (3iL,  it  suffices  to  prove  this  inequality  with  Cx  replaced  by 
f3\L.  Writing  C2  =  (1  —  f3\ )L,  f3'  =  and  dividing  through  by  L  verifies  the 
result. 

Case  2:  Suppose  gt  >  0  for  every  edge  %  G  E.  This  implies  that  f  +  g  is  a  Nash  flow 
for  (G,  1,£).  By  Corollary  6.5.3(a)  we  have  f m  <  fm  +  gm  <  f*n  and  in  particular 
/*,  >  0.  It  follows  that  the  LLF  strategy  /  failed  to  saturate  edge  m,  so  fm  =  (3 
and  fi  =  0  for  i  <  m. 

Our  first  goal  is  to  show  that  /  is  an  LLF  strategy  not  only  for  X  but  also  for 
T  =  ( G\  1  —  gi,  £,  i~);  where  G'  is  the  graph  induced  by  the  last  m  —  1  edges  of  G; 
we  may  then  apply  the  inductive  hypothesis  to  /  restricted  to  this  smaller  instance. 
Toward  this  end,  let  f*  denote  the  optimal  flow  for  the  instance  (O',  1  —  gi,  £).  Since 
/  +  g  restricted  to  G'  is  a  Nash  flow  for  (G',  1  —  gi,  £),  we  must  have  f  *n  >  fm  +  gm 
(see  Corollary  6.5.3(a));  since  (3  —  fm  <  fm  +  gm  <  /^,  the  LLF  strategy  for  T  is 
precisely  /  (restricted  to  the  edges  of  G'). 

Let  CjbCg  denote  the  total  latency  incurred  by  f*  on  the  first  edge  and  in  G', 
respectively.  The  next  claim  gives  a  lower  bound  on  C|,  as  a  function  of  the  amount 
of  traffic  routed  on  edges  of  G'  in  the  optimal  flow. 

Claim:  If  r'  >  1  —  gi  ,  then  the  cost  of  the  optimal  flow  for  (Gf  r\  £)  is  at  least 

— -j—  (1  —  9i)L  +  (r'  —  1  +  gi)X 

where  (3'  =  -r^—. 
r  1-31 
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Proof.  The  claim  is  proved  for  r'  —  1  —  g\  by  applying  the  inductive  hypothesis 
to  the  instance  X'  =  (G\  1  —  g\,E,[3')  and  using  the  fact  that  /  is  an  LLF  strategy 
for  G'  inducing  a  flow  of  cost  (1  —  g\)L.  Suppose  now  that  r'  >  1  —  g1.  We  again 
denote  the  optimal  flow  for  ( G 1  —  g i,£)  by  /*.  Since  f*  and  f  +  g  (restricted  to 
the  edges  of  G')  are  flows  at  the  same  rate  (namely,  1  —  g±)  and  the  common  latency 
of  every  edge  with  respect  to  /  +  g  is  L,  there  is  some  edge  i  with  f*  >  0  and 
Eflf*)  >  L.  Since  the  marginal  cost  of  an  edge  is  at  least  its  latency,  Proposition  2.3.1 
implies  that  the  marginal  cost  of  every  edge  in  G '  is  at  least  L  with  respect  to  /*. 
By  Corollary  6.5.3(c)  and  the  observation  that  marginal  costs  are  nondecreasing 
functions  of  the  edge  congestion,  extending  f*  from  an  optimal  flow  for  ( G' ,  1  —  gi,  £) 
to  an  optimal  flow  for  ( G' ,  r' ,  £)  involves  the  routing  of  r'  —  (1  —  gi)  units  of  flow,  all 
routed  at  a  marginal  cost  of  at  least  L.  Thus,  the  overall  cost  of  an  optimal  flow  to 
(G\  r' ,  £)  must  be  at  least  C(f*)  +  (r'  —  1  +  g\)L  >  ^^-(1  —  gi)L  +  (r'  —  1  +  gi)L.  m 

Our  goal  is  to  prove  that  (3  +  (3)C{f  +  g)  <  4C(/*)  =  4(Cj*  +  C^);  with  the 
claim  in  hand,  we  have  reduced  the  proof  of  the  theorem  to  proving  the  inequality 

(3  +  (3)L  <  (3  +  P')(l  —  gi)L  +  4 {gx  —  fl)L  +  4ai(/*)2 

where  (3'  =  (recall  that  £\{x)  =  a i x  and  hence  C{  =  ai(/1*)2).  For  any  fixed 
value  of  <7!,  /*  e  [|gi,gi]  (see  Corollary  6.5.3(b)).  Using  the  identity  a.\g\  =  L  and 
differentiating,  we  find  that  the  right-hand  side  is  minimized  by  /,*  =  \g\.  Since  the 
left-hand  side  is  independent  of  /)*,  it  suffices  to  prove  that 

(3  +  (3)L  <  (3  +  P')(l  —  gf)L  +  2 g\L  +  aig2. 

Substituting  for  /31,  using  the  identity  U\gi  =  L,  and  dividing  by  L  gives 

3  +  (3  <  (3  +  - - )(1  —  g\)  +  3gi 

1  ~9i 

which  clearly  holds,  proving  the  theorem.  ■ 

6.6  Complexity  of  Computing  Optimal  Strategies 

Thus  far,  we  have  measured  the  performance  of  a  Stackelberg  strategy  by  comparing 
the  cost  of  the  corresponding  induced  flow  to  the  cost  of  the  minimum- latency  flow. 
Another  natural  approach  for  evaluating  a  strategy  is  to  compare  the  cost  of  the 
induced  flow  to  that  of  the  least  costly  flow  induced  by  some  Stackelberg  strategy, 
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that  is,  to  the  cost  of  the  flow  induced  by  the  optimal  strategy.  Motivated  by  the 
latter  measure,  in  this  section  we  study  the  optimization  problem  of  computing  the 
optimal  Stackelberg  strategy. 

We  have  seen  that  in  networks  of  parallel  links  the  LLF  strategy  provides  the  best 
possible  (worst-case)  performance  guarantee  relative  to  the  cost  of  the  minimum- 
latency  flow.  In  particular,  the  algorithm  of  Subsection  6.3.2  may  be  viewed  as 
a  —approximation  algorithm  for  computing  the  optimal  Stackelberg  strategy  in 
networks  of  parallel  links  and  a  G—  _  appr  oxi  mat  ion  algorithm  for  such  instances 
with  linear  latency  functions.  Simple  examples  show  that  the  LLF  strategy  is  not 
always  the  optimal  strategy,  and  thus  our  algorithm  fails  to  solve  this  optimization 
problem  exactly  (see  Section  B.4).  Our  main  result  of  this  section  is  strong  evidence 
that  no  such  polynomial-time  algorithm  exists. 

Theorem  6.6.1  The  problem  of  computing  the  optimal  Stackelberg  strategy  is  NP- 
hard,  even  for  instances  in  networks  of  parallel  links  with  linear  latency  functions. 

Proof.  We  reduce  from  a  problem  we  call  |-|  PARTITION:  given  n  positive  integers 
di,  a2, , .  ■ ,  an,  is  there  a  subset  S  C  {1,  2, ... ,  n}  satisfying  Jfies  a%  —  |  S"=i  ?  The 
canonical  reduction  from  the  NP-complete  problem  Subset  Sum  to  Partition  is 
easily  modified  to  show  that  |-|  Partition  is  NP-hard  (see  Garey  and  Johnson  [77] 
for  problem  definitions  and  Karp  [95]  or  Kozen  [109,  P.129]  for  the  canonical  reduc¬ 
tion).  We  will  show  that  deciding  the  problem  1-|  PARTITION  reduces  to  deciding 
whether  or  not  a  given  Stackelberg  instance  on  a  network  of  parallel  links  with  linear 
latency  functions  admits  a  Stackelberg  strategy  inducing  a  flow  with  a  given  cost. 

Given  an  arbitrary  instance  X  of  |-|  Partition  specified  by  positive  integers 
di, . . . ,  an,  put  A  =  Yfi= i  ai  and  define  a  Stackelberg  instance  T  =  (G,  2A,  i,  |)  on 
a  network  G  of  parallel  links  with  edge  set  {1,2,. ..,71  +  1}  and  with  linear  latency 
functions  £i(x)  =  +  4  for  i  =  1, . . . ,  n  and  £n+i(x)  —  T-.  It  is  clear  that  T  can  be 

constructed  from  X  in  polynomial  time.5  We  claim  that  X  is  a  “yes  instance”  (that 
is,  admits  a  |-|  partition)  if  and  only  if  there  is  a  Stackelberg  strategy  for  instance 
V  inducing  a  flow  with  cost  at  most  'Sf-A. 

First  suppose  X  is  a  “yes  instance”  of  |-|-Partition,  with  S  C  {1,2,  ...,n} 
satisfying  J2ies  ai  —  §  A  and  consider  the  strategy  defined  by  fi  =  {a*  for  i  e  S 

5We  are  assuming  that  a  linear  latency  function  is  represented  in  the  problem  instance  X'  by 
the  binary  encodings  of  its  two  coefficients.  See  Section  5.2  for  more  details  on  the  encoding  of  an 
instance. 
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and  fi  —  0  otherwise  (since  J2/=i  fi  =  7  s  (li  —  ff-4  =  ^A,  this  defines  a 
Stackelberg  strategy).  The  induced  equilibrium  is  then  gt  =  0  for  i  G  S,  gi  = 
for  ie  {l,2,...,n}\  S',  and  gn+  \  =  y|A  In  the  induced  flow  f  +  g,  the  A/2  units 
of  traffic  on  edges  corresponding  to  S  experience  -j-  units  of  latency,  while  the  other 
3  A/ 2  units  of  traffic  experience  -j-  units  of  latency.  The  cost  of  /  +  g  is  thus 

A  19  3A  17  _  35^ 

IT  +  ITT  ~  ~4  ' 

Now  suppose  that  X  is  a  “no  instance”  of  |-|-Partition,  and  consider  any 
Stackelberg  strategy  /  for  X7,  inducing  equilibrium  g.  We  need  to  show  that  C(f  + 
g)  >  y A .  Call  edge  i  G  {1,  2, . . . ,  n  +  1}  heavy  if  =  0  and  light  otherwise.  Our 
first  observation  is  that  edge  n  +  1  must  be  light  (even  if  all  centrally  controlled 
traffic  is  routed  on  edge  n+  1,  some  selfish  traffic  will  use  it).  Next,  we  note  that  for 
i ,  j  G  {1,2,...,  n},  the  marginal  cost  2(^+gb  +  4  of  edge  i  is  at  most  the  marginal 
cost  >  _|_  4  of  edge  j  if  and  only  if  the  latency  C(/)  +  gi)  of  edge  i  is  at  most 

£j(fj  +  g -j )  ■  We  may  assume  that  all  heavy  edges  have  the  same  marginal  cost  with 
respect  to  f  +  g,  as  rerouting  some  centrally  controlled  traffic  from  a  heavy  edge 
with  large  marginal  cost  to  a  heavy  edge  with  small  marginal  cost  does  not  affect 
the  induced  equilibrium  and  can  only  decrease  the  cost  of  the  induced  flow.  We  can 
therefore  also  assume  that  all  heavy  edges  have  equal  latency  with  respect  to  /  +  g. 
Naturally,  Lemma  6.2.4  implies  that  all  light  edges  possess  a  common  latency  with 
respect  to  f  +  g.  That  all  edges  have  one  of  two  latencies  will  make  the  cost  of  f  +  g 
easy  to  compute. 

With  the  induced  flow  f  +  g  still  fixed,  let  S  C  {1,2 , ,  n}  denote  the  set  of 
heavy  edges.  If  S  —  0  then  f  +  g  is  the  Nash  flow  for  T  with  fi  +  g%  —  y  for 
*  G  {1,2,...,  n}  and  fn+ 1  +  gn+ 1  =  §A,  satisfying  C(f  +  g)  =  9A  >  So  suppose 
S  is  non-empty  and  define  A  G  (0, 1]  by  the  equation  J2ies  ai  =  AA  Define  g  G  (0,  {] 
by  the  equation  Yiies  fi  =  gA.  Our  aim  is  to  lower  bound  the  cost  of  /  +  g  as  a 
function  of  the  parameters  A  and  p.6 

Since  all  heavy  edges  have  equal  latency,  for  1  heavy  we  must  have  /;  +  gi  =  fi  = 
fat  with  £i(fi  +  gi)  =  4  +  j.  Since  all  light  edges  have  equal  latency,  we  must  have 

fi  +  gi- ch4_3A 

6Not  all  joint  values  of  A  G  (0, 1]  and  y  G  (0,  \]  are  achievable  with  Stackelberg  strategies,  but 
this  will  not  hinder  our  analysis. 
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with 

£i(fi  +  9i )  =  4  +  -  —  ^ 

for  i  G  {1,  2, . . , ,  n}  \  S  and 


fn+l  +  9n+l 


2-3  p  \ 
12  —  9A/ 


with 

2  —  3  /x 

f  r)+l  (jn+1  +  fi'n+l)  —  4  +  — 

The  total  cost  of  this  solution  is 


C(f  +  g)  —  gA  ^4  +  ^  +  (2  -  g)A  ^4  +  -  — 

A  ( o  ,  (4  —  3A)p2  +  A(2  —  p)(2  —  3g)\ 

=  -"l84  A(433A)  )■ 

Holding  A  fixed  and  differentiating  with  respect  to  /i,  we  find  that  this  expression 
has  unique  minimizer  p  =  A  when  A  <  \  and  p  =  4  when  A  >  4  (subject  to  the 
condition  p  G  (0,  |]).  There  are  now  two  cases  to  analyze.  First  suppose  that  A  < 
Setting  p  =  A  we  obtain 

c{f+^  =  A(8  +  i^)’ 


differentiating  with  respect  to  A,  we  see  that  the  expression  has  a  unique  minimizer 
A  =  4  (subject  to  the  condition  A  G  (0,  4])  yielding  cost  H(8  +  |)  >  ^A  Finally, 
assume  that  A  >  4.  Setting  p  =  |  we  find  that  the  cost  of  /  +  g  is  given  by 

C(/  +  9)=4+V^3A)); 

differentiating  with  respect  to  A,  we  find  that  this  expression  has  unique  minimizer 
A  =  |,  at  which  point  the  equation  reads  C(f  +  g)  =  ^A  However,  since  X  is  a 
“no  instance”  of  |-|  Partition,  we  must  have  A  ^  |  and  hence  C(f  +  g)  >  ^-A. 
We  have  exhausted  all  possible  cases,  and  the  reduction  is  complete.  ■ 


Chapter  7 

Recent  and  Future  Work 


In  this  thesis,  we  have  studied  the  loss  in  network  performance  due  to  selfish  routing. 
We  have  both  quantified  the  worst-possible  ratio  between  flows  at  Nash  equilibrium 
and  the  best  coordinated  outcome  (the  minimum-latency  flow)  and  investigated  how 
to  control  the  inefficiency  inherent  in  a  Nash  flow  via  network  design  and  Stackcl- 
berg  strategies.  While  we  have  succeeded  in  answering  a  few  basic  questions  about 
selfish  routing,  we  feel  that  our  work  is  only  a  small  step  toward  understanding  the 
complex  interactions  between  selfish  behavior  and  a  desire  for  global  optimization 
in  networks.  In  the  hope  of  conveying  this  sentiment  to  the  reader,  in  this  final 
chapter  we  suggest  some  open  questions  and  unexplored  directions  that  beckon  for 
further  research.  We  also  summarize  recent  work  related  to  these  topics.  Our  list 
is  not  exhaustive  and  is  meant  only  to  indicate  the  wide  array  of  possibilities  for 
future  work;  the  imaginative  reader  will  doubtless  discover  further  interesting  lines 
of  inquiry. 

How  Common  is  Braess’s  Paradox? 

In  Chapter  5  we  studied  the  worst-possible  increase  in  total  latency  due  to  harmful 
extraneous  edges  in  a  network,  thereby  determining  the  extent  to  which  Braess’s 
Paradox  generalizes  to  and  becomes  more  severe  in  large  networks.  Since  we  ex¬ 
amined  this  issue  through  the  lens  of  worst-case  analysis,  we  were  naturally  led  to 
“extremal”  bad  examples  (the  Braess  graphs  of  Subsection  5.4.2)  unlikely  to  occur 
in  practice.  The  following  important  question  is  as  yet  poorly  understood:  to  what 
extent  does  Braess’s  Paradox  occur  in  “typical”  networks? 

Open  Question  1  For  a  single-commodity  instance  (G,r,£),  let  L(G,r,£)  denote 
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the  common  latency  of  every  flow  path  of  a  Nash  flow  for  (G,  r,  £)  (as  in  Chapter  5). 
Let  r(G,r,£)  >  1  denote  the  largest  ratio  between  L(G,r,£)  and  L(H,r,£)  for  a 
subgraph  H  of  G.  What  can  be  said  about  the  distribution  of  r(G,  r,  £)  for  some 
“reasonable”  distribution  on  single-commodity  instances  (G,r,  •£)?  How  often  can 
removing  edges  improve  the  flow  at  Nash  equilibrium — that  is,  for  what  fraction  of 
instances  is  r(G,r,£)  >  1? 

Remark  7.0.1  As  an  example,  we  can  obtain  a  simple  yet  nontrivial  distribution 
on  instances  (G,  r,  £)  by  adapting  the  classical  random  graph  model  G(n,p )  of  Erdos 
and  Renyi  [23,  61].  Fix  parameters  n  G  J\f  and  p  G  (0, 1).  Define  an  instance  (G,  r,  £) 
by  the  following  random  process:  set  the  vertex  set  of  G  to  be  V  =  {1,2, . . . ,  n}; 
independently  for  each  ordered  pair  (i,j)  of  distinct  vertices,  include  (one  copy  of) 
edge  (i,j)  with  probability  p\  independently  for  each  included  edge  e,  assign  e  the 
latency  function  £(x)  =  1  or  £{x)  =  x,  chosen  uniformly  at  random;  set  vertex  1  to 
be  the  source,  vertex  2  to  be  the  destination,  and  the  traffic  rate  r  to  be  1.  What 
is  E[r(G,r,£)]7  Does  this  expectation  tend  to  a  limit  as  n  oo  for  a  fixed  choice 
of  p  (say,  p  =  !)? 

Remark  7.0.2  It  is  a  “folklore”  belief  that  instances  with  r-value  greater  than  1 
are  fairly  common,  and  thus  Braess’s  Paradox  fails  to  qualify  as  a  “pathological” 
example.  Some  headway  in  this  direction  has  been  made  by  Steinberg  and  Zang- 
will  [176]  (whose  approach  was  later  extended  to  more  general  traffic  models  by 
Dafermos  and  Nagurney  [48]),  who  argue  that  “Braess’s  Paradox  is  about  as  likely 
to  occur  as  not  occur”  [176,  P.312] .  The  analysis  of  [176]  leaves  much  to  be  done, 
however:  Steinberg  and  Zangwill  [176]  restrict  attention  to  subgraphs  H  with  one 
less  edge  than  G,  assume  that  every  edge  used  by  the  Nash  flow  in  H  is  also  used  by 
the  Nash  flow  in  G  (an  assumption  that  fails  in  our  version  of  Braess’s  Paradox — 
see  Subsections  1.2.2  and  2.4.2),  and  do  not  specify  a  probability  distribution  on 
problem  instances. 

A  related  (and  easier)  question  is  the  following:  for  what  networks  G  is  there 
a  traffic  rate  r  and  a  set  of  edge  latency  functions  £  such  that  r(G,r,£)  >  1?  Let 
us  call  such  a  network  vulnerable.  Vulnerable  networks  are  therefore  the  networks 
that,  under  an  adversarial  choice  of  latency  functions  and  traffic  rate,  suffer  degra¬ 
dation  in  network  performance  due  to  undesirable  extra  edges.  Confining  our  study 
to  networks  (always  assumed  to  possess  a  worst-case  choice  of  latency  functions) 
rather  than  to  instances  (networks  already  endowed  with  an  arbitrary  set  of  latency 
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functions)  simplifies  matters  considerably.  This  fact  is  illustrated  by  the  following 
characterization  of  vulnerable  networks,  asserted  by  Murchland  [128]  and  proved  in 
detail  by  Milchtaich  [125]. 

Fact  7.0.3  ([125,  128])  Let  G  be  a  directed  graph  with  source  vertex  s,  destination 
vertex  t,  and  with  every  vertex  lying  on  some  s-t  path.  Then  the  following  are 
equivalent: 

(1)  G  is  vulnerable 

(2)  G  contains  a  subdivision  of  the  network  of  Braess’s  Paradox  (Figure  2.2)  as  a 
subgraph. 

By  a  well-known  forbidden  subgraph  characterization  of  series-parallel  graphs  [57, 
182],  Fact  7.0.3  implies  that  the  vulnerable  graphs  are  precisely  those  for  which  the 
subgraph  induced  by  the  vertices  lying  on  some  s-t  path  fails  to  be  two-terminal 
series-parallel.  This  in  turn  implies  that  vulnerable  graphs  can  be  recognized  in 
linear  time  [182],  Fact  7.0.3  also  shows  that  vulnerable  graphs  are  ubiquitous  (the 
class  of  two-terminal  series-parallel  directed  graphs  is  a  restrictive  one),  a  fact  that 
could  be  useful  in  proving  that  “most”  instances  have  r- value  greater  than  1;  see 
Open  Question  1  above. 

This  characterization  of  vulnerable  graphs  stands  in  stark  contrast  to  the  problem 
of  identifying  the  instances  (G,  r,£)  satisfying  r(G,  r,£)  >  1;  our  hardness  results  of 
Chapter  5  imply  that,  assuming  P  ^  NP,  such  instances  have  no  similarly  simple 
(to  be  precise,  polynomial-time  checkable)  characterization.  Given  the  simplicity  of 
Fact  7.0.3  and  its  proof,  it  is  natural  to  seek  generalizations.  Toward  this  end,  we 
will  say  that  a  network  G  is  c-vulnerable  if  there  is  a  traffic  rate  r  and  a  set  of  latency 
functions  £  such  that  t(G,  r,  £)  >  c.  Recalling  the  Braess  graphs  of  Subsection  5.4.2, 
we  pose  the  following  problem. 

Open  Question  2  Prove  or  disprove:  there  is  a  function  g  such  that  every  g(c)- 
vulnerable  network  contains  a  subdivision  of  the  cth  Braess  graph  Bc  as  a  subgraph. 

Remark  7.0.4  Fact  7.0.3  implies  that  the  trivial  algorithm  (the  network  design 
heuristic  of  building  the  whole  network)  is  optimal  for  networks  that  exclude  subdi¬ 
visions  of  the  first  Braess  graph  (Figure  2.2)  as  subgraphs.  Similarly,  a  positive  res¬ 
olution  to  Open  Question  2  would  prove  that  the  trivial  algorithm  has  constant  ap¬ 
proximation  ratio  for  networks  that  exclude  subdivisions  of  sufficiently  large  Braess 
graphs. 
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The  Average  Price  of  Anarchy 

Throughout  Chapter  3,  we  were  of  a  single  mind:  to  compute  the  price  of  anarchy, 
defined  as  the  worst-possible  ratio  p(G,r,£)  between  the  costs  of  a  Nash  and  of  an 
optimal  flow  for  an  instance  ( G,  r,£ ).  As  with  Braess’s  Paradox,  little  is  known 
about  the  value  of  p  in  “typical”  instances. 

Open  Question  3  What  can  be  said  about  the  distribution  of  p(G,  r,  £)  for  some 
“reasonable”  distribution  on  instances  ( G,r,£)7 

Progress  on  this  question  for  any  nontrivial  class  of  instances  (such  as  the  setting 
outlined  in  Remark  7.0.1)  would  be  of  interest. 

Friedman  [74]  recently  proved  an  interesting  result  related  to  Open  Question  3, 
stating  that  in  a  network  with  arbitrary  latency  functions,  for  “most”  traffic  rate 
vectors  the  cost  of  selfish  routing  is  much  smaller  than  the  worst-case  value.  To 
state  his  result  more  precisely,  fix  a  network  G  with  latency  functions  £,  and  let 
N(r)  be  the  cost  of  a  Nash  flow  for  instance  (G,r,£).  Friedman  uses  the  ratio 
A (r)  =  N(r)/N(r/ 2)  as  a  sensitivity  measure  of  the  problem  instance  ( G,r,£ ). 
Applying  Theorem  3.6.1  to  (G,r/ 2,£)  shows  that  the  ratio  p(G,r,£)  between  the 
cost  of  the  Nash  and  optimal  flows  for  ( G,r,£ )  is  bounded  above  by  A(r),  and  this 
bound  can  be  achieved.  Friedman  [74]  shows  that  for  “most”  traffic  rate  vectors  r' 
in  [r/2,r],  the  ratio  p(G,r',£ )  is  only  0(logA(r)). 

Stackelberg  Routing 

In  Chapter  6  we  studied  the  problem  of  indirectly  controlling  selfish  network  users 
via  Stackelberg  strategies — that  is,  by  routing  a  small  fraction  of  the  overall  traffic 
centrally.  Our  work  concerned  only  networks  of  parallel  links,  leaving  the  important 
generalization  to  arbitrary  networks  open.  While  the  bad  example  of  Section  B.3 
shows  that  the  guarantee  of  Theorem  6.4.2  for  networks  of  parallel  links  cannot 
be  extended  to  arbitrary  networks,  a  guarantee  with  worse  dependence  on  (3  (the 
fraction  of  traffic  that  is  centrally  controlled)  may  be  possible. 

Open  Question  4  Is  there  a  function  g(-)  such  that  the  following  statement  holds: 
for  any  single-commodity  Stackelberg  instance  (G,  r,  £,  (3)  with  standard  latency 
functions,  there  is  an  efficiently  computable  Stackelberg  strategy  that  induces  a 
flow  with  cost  at  most  g(/3)  ■  C(f*),  where  f*  is  an  optimal  flow  for  (■ G,r,£ )? 
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We  emphasize  that  the  function  g  can  have  arbitrary  dependence  on  /3,  but  is 
independent  of  the  size  of  the  network  G. 

Remark  7.0.5  We  confine  our  question  to  single-commodity  instances  because 
multicommodity  instances  can  be  largely  immune  to  Stackelberg  strategies.  Pre¬ 
cisely,  there  are  instances  with  n  vertices  and  k  =  O(n)  commodities  such  that  any 
Stackelberg  strategy  routing  half  of  the  traffic  of  each  commodity  induces  a  flow 
with  cost  Q(k)  times  that  of  the  optimal  routing  of  all  of  the  traffic. 

We  noted  in  Section  6.6  that  we  evaluate  a  Stackelberg  strategy  by  comparing 
the  cost  of  the  flow  it  induces  to  that  of  an  optimal  routing  of  all  of  the  traffic,  rather 
than  to  the  minimum- latency  flow  induced  by  some  Stackelberg  strategy.  Outside 
of  our  hardness  result  for  computing  Stackelberg  strategies  (Theorem  6.6.1),  we 
have  not  considered  the  complexity  of  the  optimization  problem  of  computing  the 
best  Stackelberg  strategy.  Recent  work  by  Kumar  and  Marathe  [112]  resolves  this 
problem  for  networks  of  parallel  links  with  a  fully  polynomial-time  approximation 
scheme1  (FPTAS)  for  the  problem  under  mild  conditions  on  the  network  latency 
functions.  The  results  of  [112]  also  apply  to  networks  slightly  more  general  than 
those  of  parallel  links,  but  the  problem  of  approximating  the  optimal  Stackelberg 
strategy  in  general  networks  remains  open. 

Admission  Control 

Throughout  this  work,  we  have  assumed  that  all  traffic  rates  are  given  and  im¬ 
mutable.  What  if  rates  are  under  control  of  the  network  manager — that  is,  what 
if  admission  control  is  permitted?  Several  algorithmic  questions  arise  in  this  set¬ 
ting;  we  will  describe  one  in  detail.  Let  us  consider  a  network  in  which  the  amount 
of  traffic  between  each  source-destination  pair  is  easy  to  control,  but  centralized 
routing  is  infeasible.  The  network  manager  wishes  to  maximize  the  amount  of  traf¬ 
fic  routed  (e.g.,  in  order  to  maximize  revenue).  To  make  the  problem  nontrivial, 
we  impose  quality  of  service  (QoS)  constraints:  to  each  commodity  i  we  associate 
a  threshold  L7Xax  representing  the  maximum  amount  of  latency  that  network  users 
corresponding  to  commodity  i  are  willing  to  tolerate.  The  manager’s  problem  is  then 

1A  fully  polynomial-time  approximation  scheme  for  a  minimization  problem  is  an  algorithm  A 
with  the  following  property  for  some  polynomial  p(,  -):  given  error  parameter  e  >  0  and  problem 
instance  X  with  size  \X\,  A  returns  a  solution  to  X  with  objective  function  value  at  most  1  +  e  times 
that  of  optimal  in  time  at  most  p(|Z|,  e-1). 
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to  maximize  the  amount  of  traffic  routed  subject  to  the  QoS  constraints,  assuming 
selfish  routing. 

Open  Question  5  Design  a  good  approximation  algorithm  for  the  following  op¬ 
timization  problem:  given  a  network  G  with  latency  functions  £,  a  vector  rmax  of 
maximum  allowable  traffic  rates,  and  a  vector  Lmax  of  QoS  constraints,  find  a  traffic 
rate  vector  r  maximizing  XaD  subject  to  rt  <  r™ax  and  Lj(/)  <  Llxiax  for  each 
commodity  i,  where  Lj(/)  is  the  common  latency  of  every  Sj-tj  flow  path  in  a  Nash 
flow  /  for  (G,r,  £). 

Remark  7.0.6  The  optimization  problem  posed  in  Open  Question  5  completely 
ignores  the  issue  of  fairness,  in  that  the  optimal  solution  may  route  an  enormous 
amount  of  one  commodity  and  none  of  another.  Addressing  fairness  concerns  such 
as  this  is  yet  another  wide  open  area  for  future  work. 

The  Price  of  Selfishness  in  Other  Games 

While  open  questions  about  the  traffic  model  studied  in  this  dissertation  abound, 
an  even  more  exciting  direction  for  future  research  is  the  study  of  the  inefficiency  of 
selfish  behavior  in  other  games  (in  networks  and  otherwise).  Before  elaborating  on 
this  point,  we  briefly  mention  some  recent  efforts  along  these  lines.  Two  papers  that 
generalize  models  previously  mentioned  in  this  work  and  then  study  the  price  of 
selfishness  are  Schulz  and  Stier  Moses  [167],  who  extend  the  traffic  routing  model  of 
Chapter  2  to  networks  with  explicit  edge  capacity  constraints,  and  Czumaj  et  al.  [43], 
who  augment  the  load-balancing  model  of  Koutsoupias  and  Papadimitriou  [108] 
by  allowing  arbitrary  (nonlinear)  objective  functions.  Vetta  [183]  departs  more 
significantly  from  previous  work  and  studies  the  inefficiency  of  Nash  equilibria  in  a 
broad  class  of  profit-maximization  problems  that  includes  auctions,  facility  location 
games,  as  well  as  games  related  to  the  selfish  routing  problems  of  this  thesis.  To 
connect  Vetta’s  work  with  ours,  define  a  game  in  a  multicommodity  flow  network 
where  players  correspond  to  commodities,  and  each  player  controls  both  the  routing 
of  its  flow  (cf.,  the  finite  splittable  instances  of  Section  4.2)  and  its  traffic  rate. 
Player  i  receives  revenue  for  each  unit  of  flow  it  sends,  and  experiences  cost  equal 
to  the  total  latency  incurred  by  flow  of  commodity  i;  player  i’s  objective  function 
is  to  maximize  its  profit  (revenue  minus  cost),  and  the  global  objective  function  is 
defined  as  the  sum  of  all  profits  (equivalently,  total  revenue  minus  total  latency).  A 
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consequence  of  Vetta’s  work  is  that,  under  certain  conditions,  a  Nash  equilibrium 
of  this  game  will  obtain  at  least  half  of  the  profit  enjoyed  by  the  best  coordinated 
outcome;  see  [183]  for  further  details. 

Given  the  prevalence  of  game-theoretic  analysis  in  the  networking  literature  (il¬ 
lustrated  by,  for  example,  the  survey  of  Altman  et  al.  [6]  and  the  many  references 
therein),  we  expect  the  idea  of  quantifying  the  inefficiency  arising  from  selfish  be¬ 
havior  to  find  numerous  applications  beyond  those  of  this  dissertation  and  of  the 
papers  mentioned  above.  Moreover,  we  believe  that  a  key  contribution  of  our  work 
is  the  identification  of  several  questions  about  the  inefficiency  of  Nash  equilibria  (or 
of  other  game-theoretic  solution  concepts)  that  are  likely  to  have  clean  and  nontriv¬ 
ial  solutions.  We  conclude  by  making  this  assertion  concrete  and  offering  a  set  of 
questions  that  should  constitute  a  general  and  useful  paradigm  for  analyzing  selfish 
behavior  in  future  research: 

-  What  is  the  worst-case  ratio  between  the  objective  function  value  (perhaps 
the  sum  or  the  minimum  of  player  utilities)  of  a  selfish  equilibrium  and  that  of 
the  best  coordinated  outcome?  (Due  originally  to  Koutsoupias  and  Papadim- 
itriou  [108].) 

-  Are  there  other  types  of  comparisons  that  bound  the  price  of  selfishness  in 
a  meaningful  way?  (Cf.,  our  bicriteria  bound  in  Section  3.6  and  Friedman’s 
“average-case”  price  of  anarchy  result  mentioned  above.) 

-  What  are  the  “sources  of  inefficiency”  for  selfish  equilibria?  Do  simple  games 
suffer  from  the  worst-possible  consequences  of  uncoordinated  behavior?  (For 
example,  we  saw  in  Sections  3. 3-3. 4  that  the  complexity  of  the  underlying 
network  topology  in  essence  fails  to  contribute  to  the  inefficiency  of  flows  at 
Nash  equilibrium.) 

-  Are  there  natural  design  and/or  management  principles  that  ensure  that  the 
price  of  selfishness  is  reasonable? 


Part  IV 
Appendices 
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Appendix  A 
Odds  and  Ends 


This  appendix  gathers  together  some  results  about  selfish  routing  that  may  be  of 
interest  but  do  not  fall  within  the  scope  of  the  main  text.  We  begin  in  Section  A.l 
with  a  “quick  and  dirty”  upper  bound  on  the  price  of  anarchy  that  follows  relatively 
easily  from  our  work  in  Chapter  2.  In  Section  A. 2  we  describe  different  methods  of 
quantifying  the  “steepness”  of  network  latency  functions.  In  Section  A. 3  we  apply 
one  of  these  methods  to  quantify  the  potential  “unfairness”  of  optimal  flows. 

A.l  A  “Quick  and  Dirty”  Upper  Bound  on  the 
Price  of  Anarchy 

The  proof  of  Proposition  2.5.1  provides  a  fairly  general  method  for  upper-bounding 
the  ratio  p  between  the  cost  of  a  flow  at  Nash  equilibrium  and  of  a  minimum-latency 
flow.  Specifically,  we  have  the  following  theorem. 

Theorem  A. 1.1  Suppose  the  instance  ( G,r,£ )  and  the  constant  7  >  1  satisfy 

x  ■  £Jx )  <  7  •  [  £e(t)dt 
Jo 

for  all  edges  e  and  all  positive  real  numbers  x.  Then 

p(G,r,£)  <  7. 

Proof.  Roughly  speaking,  the  theorem  holds  since  a  flow  at  Nash  equilibrium  for 
(G,r,£)  optimizes  an  objective  function  (the  objective  function  of  (NLP 2)  in  the 
proof  of  Proposition  2.5.1)  that  is  at  most  a  factor  7  away  from  the  true  objective 
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function  C(-).  More  formally,  let  /  and  f*  denote  Nash  and  optimal  flows  for 
(G,r,£),  respectively;  we  can  then  derive 

o{f)  =  EW.)/. 

eS-B 

<  tE  /  £e{t)dt 

e£EJ° 

<  tE  /  £e{t)dt 

e£EJ° 

<  7  ew;)/; 

eSB 

=  7  '  C{f*) 

where  the  hrst  inequality  follows  from  the  hypothesis,  the  second  inequality  from 
the  fact  that  the  Nash  flow  /  optimizes  the  objective  function  2e  Jo  £e(t)dt  (see 
Proposition  2.5.1),  and  the  third  inequality  from  the  assumption  that  every  latency 
function  £e  is  nondecreasing.  ■ 

Remark  A. 1.2  Theorem  A. 1.1  and  its  proof  do  not  make  use  of  the  combinato¬ 
rial  structure  possessed  by  a  network,  and  therefore  apply  more  generally  to  the 
nonatomic  congestion  games  of  Section  4.4. 

While  the  hypothesis  of  Theorem  A.  1.1  is  somewhat  opaque,  it  nevertheless  gives 
a  nontrivial  upper  bound  on  the  cost  of  selfish  routing  for  many  instances,  such  as 
instances  with  latency  functions  that  are  polynomials  with  nonnegative  coefficients. 

Corollary  A. 1.3  Suppose  every  latency  function  of  instance  ( G,r,£ )  is  a  polyno¬ 
mial  with  nonnegative  coefficients  and  degree  at  mostp.  Then, 

p(G,r,£)  <p+  1. 

Remark  A. 1.4  A  comparison  of  Table  3.1  and  Corollary  A. 1.3  shows  that  the  more 
sophisticated  approach  to  bounding  the  price  of  anarchy  presented  in  Chapter  3  (in 
particular,  Theorem  3.3.8)  can  give  a  better  guarantee  than  that  of  Theorem  A.  1.1. 
On  the  other  hand,  simple  two-node,  two- link  examples  show  that  the  conclusion  of 
Theorem  A.  1.1  cannot  be  improved  without  refining  the  hypothesis. 
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A. 2  Notions  of  Steepness 

A. 2.1  Incline 

A  theme  of  this  dissertation  is  the  dependence  of  the  price  of  anarchy  (as  well  as 
other  quantities  of  interest)  on  the  class  of  allowable  edge  latency  functions;  the 
intuition  afforded  by  the  nonlinear  version  of  Pigou’s  example  (Subsection  2.4.4) 
suggests  that  the  price  of  anarchy  grows  with  the  “steepness”  of  the  network  la¬ 
tency  functions.  Because  of  this  phenomenon,  much  of  Chapter  3  can  be  seen  as 
a  struggle  to  formulate  an  appropriate  notion  of  “steepness”  that  makes  this  de¬ 
pendence  precise  (culminating  in  the  definition  of  the  anarchy  value  of  a  latency 
function  in  Subsection  3.3.1).  In  stating  and  proving  Theorem  A. 1.1,  we  made  an¬ 
other  attempt  at  quantifying  the  steepness  of  a  latency  function.  We  record  this 
attempt  in  the  following  definition. 


Definition  A. 2.1  The  incline  T(£)  of  a  latency  function  £  is 

x  ■  £(x) 


T(£)  =  snp 


*>o  fo  £(t)dt’ 


with  the  interpretation  §  =  1-  The  incline  F(G,r,£)  of  instance  (G,r,  £)  is 


T  (G,r,£)  =  maxT(4). 

e€E 

Since  latency  functions  are  nondecreasing,  the  incline  of  any  latency  function 
(and  hence  of  any  instance)  is  at  least  1.  If  instance  ( G,r,£ )  has  incline  at  most  7, 
then  we  will  call  ( G,r,£ )  7- inclined .  Thus  Theorem  A.  1.1  can  be  succinctly  stated: 
the  price  of  anarchy  in  7-inclined  instances  is  at  most  7. 


A. 2. 2  Steepness 

Definition  A. 2.1  is  not  aesthetically  appealing  and  can  be  motivated  only  via  the 
proof  of  Theorem  A.  1.1:  the  incline  of  an  instance  measures  the  discrepancy  between 
the  different  objective  functions  minimized  by  Nash  and  optimal  flows.  There  is  also 
a  slightly  weaker  yet  somewhat  more  intuitive  version  of  incline,  which  we  introduce 
next.  To  do  so,  we  must  recall  two  notions  from  Section  2.3.  The  first  is  that 
of  a  standard  latency  function  (see  Definition  2.3.5),  and  the  second  is  that  of  a 
marginal  cost  function ,  which  for  a  differentiable  latency  function  £  is  defined  by 
£*{x)  =  ±(y  ■  £(y))(x)  =  £(x)  +  x  ■  £'{x). 
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flow  — ► 


Figure  A.l:  A  latency  function  with  large  steepness  but  moderate  incline 


Definition  A. 2. 2  The  steepness  E (£)  of  a  standard  latency  function  £  is 


E(£)  =  sup 

x>0 


£*(x) 

Kx)' 


with  the  interpretation  ^  =  1.  The  steepness  E (G,  r,  £)  of  an  instance  ( G,r,£ )  with 
standard  latency  functions  is 


E  (G,  r,  £)  =  maxE(4). 

e&E 

Remark  A. 2. 3 

(a)  If  instance  (G,  r,  £)  has  steepness  at  most  a,  we  will  call  (G,  r,  £)  a-steep. 

(b)  The  steepness  of  a  latency  function  is  bounded  below  by  its  incline. 

(c)  From  the  previous  observation,  a  a-steep  instance  is  a-inclined;  by  Theo¬ 
rem  A.  1.1,  it  follows  that  the  price  of  anarchy  in  a-steep  instances  is  at  most 
a. 

(d)  Some  latency  functions  (such  as  polynomials)  have  equal  steepness  and  incline; 
in  general,  however,  the  steepness  of  a  latency  function  can  far  exceed  its 
incline.  This  fact  is  illustrated  in  Figure  A.l,  which  shows  a  latency  function 
with  large  steepness  but  moderate  incline.  It  is  this  picture  that  inspires  our 
terminology;  very  roughly  speaking,  a  latency  function  that  increases  sharply 
even  at  a  single  point  is  steep  (by  our  definition),  while  only  a  latency  function 
whose  graph  has  a  large  (global)  increase  in  “elevation”  can  have  large  incline. 

Why  bother  defining  steepness,  which  seems  similar  to  but  weaker  than  the 
notion  of  incline?  First,  we  believe  the  steepness  of  an  instance  to  have  a  more 
natural  interpretation  than  the  incline.  Recall  from  Corollary  2.3.2  that  the  optimal 
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flow  in  a  network  with  standard  latency  functions  is  nothing  more  than  a  flow  at 
Nash  equilibrium  with  respect  to  the  marginal  cost  functions  £*;  thus,  the  steepness 
of  an  instance  simply  measures  the  worst-case  discrepancy  between  how  a  Nash 
and  an  optimal  flow  evaluates  the  cost  of  increasing  flow  on  an  edge  (cf.,  the  hard- 
to-interpret  objective  function  minimized  by  a  flow  at  Nash  equilibrium).  Second, 
we  will  see  in  Section  A. 3  that  the  steepness  of  an  instance  controls  the  potential 
“unfairness”  of  an  optimal  flow  (recall  the  example  of  Subsection  2.4.5). 

A. 3  How  Unfair  is  Optimal  Routing? 

We  saw  in  Subsection  2.4.5  that  optimal  flows,  while  minimizing  the  total  latency, 
may  lack  desirable  fairness  properties — specifically,  that  some  traffic  in  a  minimum- 
latency  flow  may  be  routed  on  paths  with  larger  latency  than  that  incurred  by  all 
traffic  in  a  Nash  flow.  This  drawback  of  routing  traffic  optimally  has  inspired  prac¬ 
titioners  to  find  traffic  assignments  that  minimize  total  latency  subject  to  explicit 
length  constraints  [90],  which  require  that  no  network  users  experience  much  more 
latency  than  in  a  flow  at  Nash  equilibrium.  The  question  we  study  in  this  section 
is  the  following:  how  much  worse  off  can  network  users  be  in  an  optimal  flow  than 
in  one  at  Nash  equilibrium?. 

For  the  rest  of  this  section,  we  will  confine  ourselves  to  instances  in  which  all 
traffic  shares  a  common  source  and  destination.  Define  the  unfairness  of  such  an 
instance  ( G ,  r,  £)  as  the  maximum  ratio  between  the  latency  of  a  flow  path  of  an 
optimal  flow  for  ( G,r,£ )  and  that  of  a  flow  path  of  a  Nash  flow  for  (G,r,  £).  We 
denote  the  unfairness  of  instance  (G,r,£)  by  u(G,r,£). 

Our  first  observation  is  that  u(G,r,£ )  can  be  arbitrarily  large  if  we  do  not  place 
additional  restrictions  on  the  class  of  allowable  latency  functions.  To  see  this,  modify 
the  example  of  Subsection  2.4.5  as  follows:  for  any  positive  integer  p,  define  the 
latency  of  the  first  edge  as  the  constant  function  £{x)  =  (p  +  1)(1  —  e)  and  that  of 
the  second  edge  as  £(x)  =  xp.  In  this  example,  u(G,r,£ )  =  (p  +  1)(1  —  e),  which 
tends  to  +oo  with  p. 

In  the  spirit  of  our  work  bounding  the  price  of  anarchy,  we  aim  to  quantify  the 
worst  possible  unfairness  as  a  function  of  the  class  of  allowable  latency  functions. 
We  have  already  formulated  the  appropriate  notion  of  “steepness”  for  quantifying 
the  unfairness  of  optimal  flows  in  instances  with  standard  latency  functions  in  the 
previous  section;  namely,  the  notion  of  steepness  given  in  Definition  A. 2. 2. 
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Theorem  A. 3.1  If(G,r,£ )  is  an  instance  with  a  single  source- destination  pair  and 
standard  latency  functions,  then 

u(G,r,£ )  <  E (G,r,£). 

Proof.  Let  (G,  r,  £)  be  an  instance  with  source  s,  destination  t,  standard  latency 
functions,  and  steepness  a.  Suppose  /  and  /*  are  Nash  and  optimal  flows  for 
(G,  r,  £),  respectively.  We  need  to  show  that  the  maximum  latency  of  a  flow  path  of 
/*  is  at  most  a  times  the  latency  of  a  flow  path  of  /. 

Suppose  for  contradiction  that  P\ ,  P2  are  paths  s-t  satisfying  fp1  >  0,  fp  >  0, 
and  ^p2(/*)  >  a  ‘  £p1(f).  Since  /  is  at  Nash  equilibrium  for  (■ G,r,£ ),  by  Propo¬ 
sition  2.2.2  all  flow  paths  of  /  have  a  common  latency  L  with  respect  to  latency 
functions  £.  Similarly,  by  Corollary  2.3.2  all  flow  paths  of  f*  have  a  common  latency 
L*  with  respect  to  latency  functions  £*. 

Now,  as  every  latency  function  is  nondecreasing,  we  have  £e(x)  <  £*(x)  for  all  e 
and  x.  Thus,  we  may  derive 

l  =  ePl(f)  <  -eP,tn  <  -tP,(n  =  —■ 

a  a  cr 

By  Proposition  2.2.4,  the  cost  of  the  flow  /  is  C(f)  =  rL.  The  cost  of  the  optimal 
flow  f*  is  not  so  easy  to  compute,  as  flow  paths  have  equal  latency  with  respect  to 
functions  £*  but  not  with  respect  to  £.  However,  since  every  latency  function  has 
steepness  at  most  er,  we  obtain  £*P(f*)  <  a  ■  £p(f*)  for  every  path  P  and  hence 

C(D  >  -  E  W*)fp  =  ~rL*  >  rL  =  G(/), 

rr  p^p  rj 

which  contradicts  the  optimality  of  f*.  m 

For  example,  an  instance  whose  latency  functions  are  polynomials  with  nonneg¬ 
ative  coefficients  of  degree  at  most  p  has  unfairness  at  most  p  +  1. 

Remark  A. 3. 2  Theorem  A. 3.1  is  not  sharp  on  all  instances  (for  a  trivial  case, 
take  G  to  be  a  single  link  with  latency  function  £{x)  =  x).  However,  the  theorem 
is  best  possible  in  the  following  sense:  for  any  real  number  c  >  1,  there  is  an 
instance  ( G,r,£ )  with  standard  latency  functions  satisfying  E (G,r,£)  <  c  (namely, 
the  example  given  at  the  beginning  of  this  section  with  p  everywhere  replaced  by 
c  —  1)  with  unfairness  arbitrarily  close  to  c. 
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Remark  A. 3. 3  Theorem  A. 3.1  and  the  previous  remark  provide  an  analogue  of 
our  work  in  Sections  3. 3-3. 4  showing  that  the  price  of  anarchy  is  independent  of 
the  network  topology.  Let  £  denote  a  standard  class  of  latency  functions  includ¬ 
ing  the  constant  functions,  and  define  the  steepness  £(£)  by  £(£)  =  supte£S(£). 
Then  sup (G,r,e)u(G,r,£)  (where  the  supremum  ranges  over  instances  with  a  single 
source-destination  pair  and  latency  functions  in  £)  is  precisely  £(£),  with  worst-case 
examples  furnished  by  networks  of  two  parallel  links.  In  fact,  it  is  not  difficult  to  see 
that  this  statement  remains  true  with  the  weaker  assumptions  that  £  is  standard 
and  is  diverse  in  the  sense  that  {£(0)  :  l  G  £}  =  (0,  oo)  (cf.  Theorem  3.4.4,  where 
more  than  two  links  are  required  for  worst-case  examples  of  the  inefficiency  of  Nash 
flows) . 

A  further  generalization  (in  the  spirit  of  Section  3.5)  is  the  following:  if  £  is 
standard  and  contains  a  latency  function  that  is  positive  when  evaluated  with  zero 
congestion,  then  the  worst-case  unfairness  of  optimal  flows  (with  respect  to  £)  is 
achieved  (modulo  an  arbitrarily  small  additive  factor)  in  subdivisions  of  a  two-node, 
two-link  network.  Some  assumption  on  £  is  necessary  for  this  sort  of  result;  indeed, 
any  network  with  latency  functions  drawn  from  £p  =  {axp  :  a  >  0}  has  steepness 
p  +  1  but  unfairness  1  (by  a  straightforward  generalization  of  Corollary  3.2.2). 


Appendix  B 

A  Collection  of  Counterexamples 


B.l  Necessity  of  Continuous,  Nondecreasing  La¬ 
tency  Functions  for  Nash  Flows 

In  this  section,  we  provide  several  examples  demonstrating  that  the  useful  properties 
of  flows  at  Nash  equilibrium  presented  in  Chapter  2,  such  as  existence  and  unique¬ 
ness,  fail  if  we  allow  edge  latency  functions  to  be  discontinuous  or  nonmonotone. 

The  next  two  propositions  study  networks  with  latency  functions  that  are  non¬ 
decreasing  but  not  continuous.  We  first  show  that  Nash  flows  need  not  exist  in  such 
networks. 


Proposition  B.1.1  There  is  a  network  G  with  7iondecreasing  discontinuous  latency 
functions  i  and  a  traffic  rate  r  such  that  (G,  r,  £)  fails  to  admit  a  feasible  flow  at 
Nash  equilibrium. 


Proof.  Let  G  denote  a  two-node  two-link  network.  Define  one  latency  function  by 
tflx)  =  1  and  another  by 


x  if  x  < 

2  if  x  >  1 


It  is  evident  that  both  latency  functions  are  nondecreasing  and  that  no  flow  feasible 
for  (G,  1,£)  meets  the  definition  of  a  Nash  flow  set  forth  in  Definition  2.2.1.  ■ 


The  next  proposition  demonstrates  that  networks  with  discontinuous  latency 
functions  can  admit  Nash  flows  with  distinct  costs. 
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Proposition  B.1.2  There  is  a  network  G  with  nondecreasing  discontinuous  latency 
functions  £  and  a  traffic  rate  r  such  that  (G,  r,  £)  admits  two  feasible  flows  at  Nash 
equilibrium  with  different  costs. 


Proof.  Define  a  network  G  as  in  the  previous  proposition.  Define  the  first  latency 
function  l\  by 


£flx) 


0  if  x  G  [0,  tt] 
1  if  x  >  | 


and  the  second  by 


W)  = 


if  x  G  [0,  |] 

if  x  >  h. 


Again  set  the  traffic  rate  r  to  be  1.  One  flow  at  Nash  equilibrium  routes  ^  of  the 
flow  on  the  first  edge  and  the  rest  on  the  second,  for  a  cost  of  |;  another  routes  | 
of  the  the  flow  on  the  first  edge  and  the  rest  on  the  second,  for  a  cost  of  |.  ■ 


Remark  B.1.3  The  previous  example  also  shows  that  the  characterization  of  Nash 
flows  given  in  Proposition  2.2.2  fails  when  network  latency  functions  need  not  be 
continuous:  Nash  flows  in  such  networks  need  not  route  all  flow  on  paths  having 
minimum-latency. 

We  next  consider  networks  in  which  latency  functions  are  continuous  but  are  not 
assumed  to  be  nondecreasing.  While  Nash  flows  still  exist  (as  the  proof  of  Proposi¬ 
tion  2.5.1  shows),  they  are  no  longer  unique  in  any  sense.  This  is  demonstrated  in 
the  next  proposition. 


Proposition  B.1.4  There  is  a  network  G  with  nonmonotone  continuous  latency 
functions  i  and  a  traffic  rate  r  such  that  (G,  r,  £)  admits  two  feasible  flows  at  Nash 
equilibrium  with  different  costs. 

Proof.  Let  G  denote  a  network  with  two  nodes  and  three  parallel  links.  Endow 
the  first  two  links  with  the  latency  function  £(x)  =  (x  —  |)2  +  1  and  the  third  with 
latency  function  £(x)  =  max{2  —  2x,  0}.  Then  (G,l,£)  admits  a  Nash  flow  that 
routes  half  the  traffic  on  each  of  the  first  two  links  (for  a  cost  of  1)  and  another 
Nash  flow  that  routes  all  traffic  on  the  third  link  (for  a  cost  of  0).  ■ 

Remark  B.1.5  A  variant  on  the  previous  example  shows  that  the  characterization 
of  Nash  flows  given  in  Proposition  2.2.2  fails  in  networks  with  nonmonotone  latency 
functions.  To  see  this,  replace  the  latency  function  on  the  third  edge  of  the  network 
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2 


1-e 


1-e 


Figure  B.l:  Theorem  4.1.3  is  sharp 


above  by  the  less  severe  latency  function  £(x)  =  max{l  —  x:  0}.  The  flow  that  routes 
half  of  the  traffic  on  each  of  the  first  two  edges  equalizes  the  latency  of  all  three 
edges  at  1  but  is  not  at  Nash  equilibrium;  any  traffic  would  be  better  off  by  rerouting 
itself  on  the  third  link. 


B.2  Theorem  4.1.3  is  Sharp 

In  Section  4.1  we  defined  the  notion  of  a  flow  at  e- approximate  Nash  equilibrium 
and  showed  in  Theorem  4.1.3  that  if  /  is  at  e-approximate  Nash  equilibrium  for  the 
instance  (G,  r,  t)  and  f*  is  feasible  for  (G,2r,  f),  then  G(/)  <  jz^C(f*).  We  now 
show  that  the  factor  of  cannot  be  improved  in  general  network  topologies. 

Fix  e  G  (0, 1)  and  consider  the  network  G  shown  in  Figure  B.l  (with  topology 
identical  to  that  of  Figure  4.1,  namely  the  Braess  Paradox  graph  of  Figure  2.2  with  a 
direct  s-t  edge  added).  Four  of  the  edges  have  constant  latency  functions,  as  shown, 
and  by  f(x)  we  mean  a  nondecreasing,  continuous  function  equal  to  0  on  [0, 1  —  5] 
and  to  1  +  e  on  [1,  oo)  (where  5  >  0  is  arbitrarily  small).  The  flow  /  routing  1  unit 
of  flow  on  the  three-hop  path  s  — >  v  — >  w  — >  t  is  at  e-approximate  Nash  equilibrium 
for  (G,  1,  £)  and  has  cost  2(1  +  e).  On  the  other  hand,  the  flow  f*  routing  1  —  5  units 
of  flow  on  each  of  the  two-hop  paths  and  25  units  of  flow  on  the  s-t  edge  is  feasible 
and  has  cost  approaching  2(1  —  e)  as  5  — »  0. 

On  the  other  hand,  the  factor  of  can  be  improved  to  1  +  e  in  networks  of 
parallel  links. 

Proposition  B.2.1  Let  G  be  a  network  of  parallel  links  and  f  at  e-approximate 
Nash  equilibrium  for  (G,r,£).  If  f*  is  feasible  for  (G,  2r,  £),  then  G(/)  <  (1  + 
e)G(D. 
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Proof.  Let  L  be  the  minimum  latency  of  any  edge  with  respect  to  /;  since  G  is 
a  network  of  parallel  links  and  /  is  at  e-approximate  Nash  equilibrium,  we  have 
C{f)  <  (1  +  e)rL.  Let  Ei  be  the  edges  e  of  G  for  which  f*  <  fe,  and  E2  the  rest  of 
the  edges.  Since  G  is  a  network  of  parallel  links,  f*  routes  less  than  J2e£El  fe  <  r 
units  of  flow  on  edges  in  E\ .  Thus,  f*  routes  at  least  2 r  —  r  —  r  units  of  flow  on 
edges  of  E2.  Since  f*  >  fe  for  all  edges  e  E  E2,  we  have  £e(ff)  >  £e(fe)  >  L  for 
all  e  E  E2.  We  have  shown  that  at  least  r  units  of  f*  experience  at  least  L  units  of 
latency;  hence  C(f*)  >  rL  >  ■ 

B.3  Stackelberg  Routing  in  General  Networks 

In  Section  6.4,  we  proved  that  in  networks  of  parallel  links  a  carefully  chosen  Stack- 
clberg  strategy  induces  a  flow  with  cost  no  more  than  jj  times  that  of  the  minimum- 
latency  flow,  where  (5  is  the  fraction  of  the  traffic  that  is  centrally  controlled.  In  this 
section  we  show  that  this  guarantee  cannot  be  extended  to  more  general  network 
topologies.  Specifically,  we  have  the  following  bad  example  in  the  graph  of  Braess’s 
Paradox  (Figure  2.2). 

Proposition  B.3.1  There  is  a  Stackelberg  instance  (G,r,£,fl)  in  which  no  flow 
induced  by  a  Stackelberg  strategy  has  cost  at  most  C(f*)//3,  where  f*  is  an  optimal 
flow  for  (G,  r,  £). 

Proof.  Let  G  be  the  graph  of  Braess’s  Paradox  (see  Figure  B.2).  Let  the  latency 
functions  of  edges  (s,w)  and  (v,t)  be  £{x)  =  1  and  of  (v,w)  be  £{x)  =  0  (as 
in  Braess’s  Paradox).  Define  the  latency  function  of  the  remaining  two  edges  by 
£(x)  =  f(x),  where  f(x)  =  0  on  [0,  |  — e],  f(x)  =  1  — e  on  [|,oo),  and  f(x)  is  defined 
arbitrarily  on  (|  —  e,  |)  subject  to  the  usual  continuity  and  monotonicity  restrictions 
(where  e  >  0  is  arbitrarily  small).  The  flow  feasible  for  (G,  1,  £)  routing  |  —  2e  units 
of  flow  on  the  three-hop  path  and  |  +  e  units  of  flow  on  each  of  the  two-hop  paths 
has  cost  approaching  |  as  e  — >  0. 

Now  consider  the  Stackelberg  instance  (G,  1,£,  |),  and  any  Stackelberg  strategy 
/.  We  must  show  that  the  flow  induced  by  /  has  large  cost.  We  first  observe 
that,  for  any  strategy  /,  all  selfish  traffic  will  be  routed  on  the  three-hop  path 
s  — >  v  — >  w  — >  t.  All  that  remains  is  a  simple  case  analysis. 

Case  1:  Suppose  /  routes  at  least  \  units  of  flow  both  on  edge  (s,v)  and  on  edge 
(■ w,t ).  By  the  previous  observation,  the  flow  induced  by  /  routes  at  least  |  units  of 
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Figure  B.2:  A  bad  example  for  Stackelberg  routing 


traffic  on  each  of  these  edges  and  thus  its  cost  is  at  least  |(1  —  e). 

Case  2:  Suppose  /  routes  less  that  \  units  of  flow  on  edge  ( w,t );  then  at  least  \ 
units  of  flow  are  routed  on  the  path  s  — >  v  — >  t.  This  implies  that  the  congestion 
on  edge  (s,v)  is  at  least  The  cost  of  the  flow  induced  by  /  must  then  be  at  least 
|  —  e  (with  at  least  1  —  e  latency  incurred  on  arcs  (s,  v)  and  (s,  w)  and  at  least  | 
latency  incurred  on  arc  (v,t)). 

Case  3:  If  neither  case  1  nor  case  2  occurs,  then  /  routes  less  than  |  units  of 
flow  on  the  edge  (s,  v)  and  hence  at  least  \  units  of  flow  on  the  path  s  — >  w  — >  t. 
Symmetric  to  the  previous  case,  this  implies  that  the  cost  of  the  flow  induced  by  / 
is  at  least  |  —  e. 

We  have  shown  that,  as  e  — >  0,  the  cost  of  the  minimum- latency  flow  for  (G,  1,  €) 
tends  to  (at  most)  |  while  the  total  latency  of  the  min-cost  flow  induced  by  some 
Stackelberg  strategy  tends  to  |  >  The  proof  is  complete.  ■ 

Remark  B.3.2  With  only  a  little  more  work,  the  counterexample  of  Proposi¬ 
tion  B.3.1  can  be  modified  to  possess  standard  (or  even  convex)  latency  functions. 

B.4  LLF  is  Not  Optimal 

In  this  section  we  demonstrate  that  the  LLF  strategy  of  Chapter  6  need  not  be  the 
optimal  Stackelberg  strategy,  even  in  networks  of  parallel  links  with  linear  latency 
functions. 

To  see  this,  consider  a  network  G  with  two  nodes  and  three  edges,  with  latency 
functions  £i(x)  =  x,  C{x)  —  1  +  x,  and  £^{x)  —  1  +  x.  In  the  instance  (G,  1,£,  |), 
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the  optimal  flow  routes  |  of  the  traffic  on  the  first  edge  and  splits  the  remaining 
traffic  equally  between  the  last  two  edges.  The  LLF  strategy  thus  routes  the  |  units 
of  centrally  controlled  flow  on  the  third  edge,  inducing  a  flow  with  |  of  the  flow  on 
the  first  edge  and  the  rest  on  the  third,  for  a  cost  of  |.  On  the  other  hand,  the 
Stackelberg  strategy  that  routes  pj  units  of  flow  on  each  of  last  two  edges  induces  a 
flow  with  cost 
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